Explain how variable selection procedures can be used to cho
Explain how variable selection procedures can be used to choose a set of independent variables for an estimated-regression equation.
Solution
The actual set of predictors variables used in the final regression model should be selected in a scientific and comfortable manner by analysing the data. This process is called variables selection process.
The process considers things such as even a remote variable affecting the dependent variable should not be ignored. But at the same time inclusion of many independent variables because irrelevent additions make the precision and prediction less accurate. Hence care must be taken to balance between fit and accurate.
The variable selections procedures are:
i) Forward step up selection: The forward selection method is to begin with no candidate variables in the model. Select the variable that has the highest R-Squared, and continue this till R^2 is insignificant. Once included in this procedure, we should not delete that variable.
ii) Backward (Step down selection):
Under this the reverse process of step up is done. That is, least significant variables are removed one by one till really significant variables are there. The significance level is normally set by the user taking into account the purpose of regression.
iii) Stepwise selection: This is a combination of the two above. Here significance level is set up for removing and adding separately. Each step a variable is added, all previous variables are checked for their significance levels. If a non significant variable is found, it is deleted.
iv) MInimum MSE:
This is similar to step wise selection except a minimum change in the root mean square is fixed. At each step, the variable which causes maximum decrease in the mean square error is removed/added. (i.e. posiiton previous reversed). This process is continued unless no variable causes considerable decrease in mean square error.
