It is common to prepreprocess the feature vectors in Rd befo

It is common to pre-preprocess the feature vectors in R^d before passing them to a learning algorithm. Two simple and generic ways to pre-process are as follows. Centering: Subtract the mean mu:= 1/|s| sigma_(x, y) S^x (of the training data) from every feature vector: x rightarrow x - mu. Standardization: Perform centering, and then divide every feature by the per-feature standard deviation sigma = Squareroot 1/|S| sigma_(x, y) S(x_i mu_i)^2: (x_1, x_2, ..., x_d) rightarrow (x_1 - mu_1/sigma_1, x_2 - mu_2/sigma_2, ..., x_d - mu_d/sigma_d). For each of the following learning algorithms, and each of the above pre-processing transformations, does the transformation affect the learning algorithm? (a) The classifier based on the generative model where class conditional distributions are multivariate Gaussian distributions with a fixed covariance equal to the identity matrix I. Assume MLE is used for parameter estimation. (b) The 1-NN classifier using Euclidean distance. (c) The greedy decision tree learning algorithm with axis-aligned splits. (For concreteness, assume Gini index is used as the uncertainty measure, and the algorithm stops after 20 leaf nodes.) (d) Empirical Risk Minimization: the (intractable) algorithm that finds the linear classifier (both the weight vector and threshold) that has the smallest training error rate. To make this more precise, consider any training data set S from X times y, and let f_s: X rightarrow Y be the classifier obtained from the learning algorithm with training set S. Let phi: X rightarrow X\' be the pre-processing transformation; and let f_s\': X\' rightarrow Y be the classifier obtained from the learning algorithm with training set S\', where S\' is the data set from X\' times Y containing (phi (x), y) for each (x, y) in S. We say phi affects the learning algorithm if, there is a set of training data S such that the classifier f_s is not the same as x rightarrow f_s\'(phi(x)). You should assume the following: (i) the per-feature standard deviations are never zero; (ii) there are never any \"ties\" whenever you compute an arg max or an arg min; (iii) there are no issues with numerical precision or computational efficiency.

Answer:

First, it is required to know when centering and standardization need to be applied.

Reason for centering: Centering is done when interpretation of estimated constant or group effects is required. During centering, it is not necessary to center all variables. Only variables of interest need to be centered.

Reason for standardization: Standardization is done when variables are measured with respect to different units and estimated coefficients are to be interpreted. During standardization, all variables are to be standardized.

1. A multivariate Gaussian distribution is characterized by a mean vector and covariance matrix. Such a distribution is also called as multivariate normal distribution. If the covariance matrix is identity matrix I, then it is referred as standard normal distribution. For such a distribution, centering and standardization are by default part of learning. It is to be noted that standard deviation is the square root of variance.

2. For good performance of nearest neighbour (NN) algorithms, it is required that all variables contribute equally to the computation. In such a scenario, centering and standardization is important.

3. Decision tree algorithms are like evaluating chained if-else conditions. In such a situation, only thing that matters whether variables (features) are comparable. Centering and standardization don\'t make much difference.

4. For algorithms falling under empirical risk minimization, centering and standardization can be applied as necessary because it will help in finding the optimized solution.

It is common to pre-preprocess the feature vectors in R^d before passing them to a learning algorithm. Two simple and generic ways to pre-process are as follow

It is common to prepreprocess the feature vectors in Rd befo

Solution

Get Help Now

Submit a Take Down Notice