1 If you had to choose between the naive Bayes and k nearest
1. If you had to choose between the naive Bayes and k -nearest neighbor classifiers, which would you prefer for a classification problem where there are numerous missing values in the training and test data sets? Indicate your choice of classifier and briefly explain why the other one may not work so well?
2. Both MDL and the pessimistic error estimate are techniques used for incorporating model complexity. State one similarity and one difference between them in the context of decision trees.
Solution
1. naive bayes : Naive Bayes is a very simple classification that makes some strong assumption about the freedom of each input variable. Nevertheless, it has been shown to be effective in a large number of problem domains
k-nearest neighbor classifiers :K-Nearest Neighbors (KNN) classification divides data into a test set and a training set. For each row of the test set, the K nearest training set objects are establish, and the classification is strong-minded by majority vote with ties out of order at random. If there are ties for the Kth adjacent vector, all candidates are included in the vote
i will choose Naive bayes beacuse super simple, you’re just doing a bunch of counts. If the NB provisional independence supposition in fact holds, a Naive Bayes classifier will meet earlier than discriminative models like logistic regression, so you need less training data even if the NB supposition doesn’t clutch, a NB classifier tranquil often does a great job in do. A good bet if want amazing fast and easy that perform pretty well. Its main disadvantage is that it can’t learn interactions between features If your training and data set is small, high bias/low variance classifier.where the these classfiers mainly used in the machine learning classfier
