Consider a scenario where you are supposed to determine if a
Consider a scenario where you are supposed to determine if a person has heart disease or not based on the following attributes: blood pressure, body weight, age, ethnicity, height, number of cigarettes consumed in a week, and type of job (which can take four values: business, healthcare, engineering, or education).
a) State one strength and one weakness of kNN for this task?
b) State one strength and one weakness of decision trees for this task?
c) What aspects of this problem might lead you to choose RIPPER over Decision Trees?
Solution
a) kNN strength:
kNN is accurate and easy to implement. It is also insensitive to outlier values.
kNN weakness:
We need a meaningful distance function for the difference between variables. Also kNN is costly and slow.
b) Decision Tree strength:
It will work better with discrete values as we have here (ethnicity, job type)
Decision tree weakness:
Decision trees tend to grow too large too soon. They can be complex to implement.
c) Why Ripper over Decision Tree?
Ripper would be as easy to interpret as Decision tree model. However it has a prune phase that enables it to be less complex than Decision tree while also checking for overfitting.
