Suppose you overhear the following conversation in your stud
Suppose you overhear the following conversation in your study group:
Jay-Z: When we add variable X3 to our regression, Adjusted-R2 (R-bar-squared) goes down. I don’t think it should be in the regression.
Beyonce: Adjusted R2 does decrease, but theory suggests that X3 should be included. Let’s keep it.
Jay-Z: Why? It doesn’t matter if X3 is in the regression anyway--the coefficient on X2 is what we care about.
How should Beyonce respond to Jay-Z?
Solution
BEYONCE RESPOND TO JAYZ:
R2 shows the linear relationship between the independent variables and the dependent variable. It is defined as 1(SSE/SSTO) which is the sum of squared errors divided by the total sum of squares. SSTO=SSE+SSR which are the total error and total sum of the regression squares. As independent variables are added SSR will continue to rise (and since SSTO is fixed) SSE will go down and R2 will continually rise irrespective of how valuable the variables you added are.
The Adjusted R2 is attempting to account for statistical shrinkage. Models with tons of predictors tend to perform better in sample than when tested out of sample. The adjusted R2 \"penalizes\" you for adding the extra predictor variables that don\'t improve the existing model. It can be helpful in model selection. Adjusted R2 will equal R2 for one predictor variable. As you add variables, it will be smaller than R2.
