Data was collected on 54 observations on a response of inter

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses of the data is shown below.

a) For the best subsets regression analysis, which is the best simple linear regression model for predicting y? Briefly explain your criteria for choosing this model

b) For all of the models listed in the best subsets regression analysis, which model is best according to the MSE criterion?

c) For all of the models listed in the best subsets regression analysis, which model is best according to the BIC criterion?

d) Is the variable from your best simple linear regression model (from part a) included in the model with the lowest overall MSE (part b)? Briefly explain why it could happen that the best single variable is not in the best overall model.

e) Following the best subsets regression results, the sums of squares for regression and error (also called residual) are displayed for several models. Using the regression sums of squares information for the full model containing all four x variables, calculate i) the R2 value for the full model, ii) The F statistic for the test of the H0 : 1 = 2 = 3 = 4 = 0 and iii) the standard deviation of the residuals for the full model

f) Using the regression sums of squares information, test the null hypothesis H0 : 2 = 4 = 0 for the full model. Calculate an F statistic, obtain a tabled F value and report the conclusion of your test, use = .05

g) Using the regression sum of squares information, for the model containing the terms x1, x2 and x4, calculate the t statistic for the hypothesis H0 = 1 = 0, where 1 is the coefficient of x1. (Hint: first calculate an F statistic for H0 and then take its square root to obtain the t value. Assume all regression coefficients are positive

Results for the best subsets regression analysis

The REG Procedure Dependent variable: y

R – Square Selection method

Number in Model    R – Square        C (p) Root MSE               BIC           Variables in Model

         1                         0.5259          757.7545      0.19035       -180.6717             x4

         1                         0.4414          901.7010      0.20662       -171.8944             x3

         1                         0.3447          1066.354      0.22378       -163.3427             x2

1 0,1257 1439.527 0.25849 -147.8645 x1

2                         0.8049          284.3219      0.12329       -227.5857         x2    x3

         2                         0.6847          489.1148      0.15673       -202.2469         x3    x4

         2                         0.6521          544.6919      0.16464       -197.0205         x1    x3   

2 0.6440 558.4141 0.16654 -195.8044 x2 x4

         3                         0.9712            3.0002        0.04781      -321.7335        x1   x2   x3

         3                         0.8758          165.5090      0.09935       -250.6043       x2   x3   x4

         3                         0.7215          428.5088      0.14878       -208.5846         x1   x3   x4

3 0.6449 558.9234 0.16799 -195.7567 x1 x2 x4

         4                         0.9712              5.0000       0.04830       -319.5296        x1   x2   x3   x4

Results from several models to predict y using various combinations of x variables.

Terms in model                       SSRegression                   SSE or SSResidual

          x2                                       1.36979                                  2.60397

          x3                                       1.75385                                   2.21991

       x1, x3                                    2.59127                                   1.38250

       x2, x4                                    2.55926                                  1.41450

   x1, x2, x3                                3.85947                                   0.11430

   x1, x2, x4                                2.56274                                   1.41103

x1, x2, x3, x4                            3.85947                                   0.11430

Solution

a. For the best subsets regression analysis, the best simple linear regression model for predicting y is the one with the highest value of R-squared. This is because R-Square represents the proportion of variation in y explained by variation in x.

So x1   x2   x3 with R-squared = 0.9712 is the answer.

b. We always choose the model with lowest error, that is MSE.

So x1   x2   x3 with MSE = 0.04781 is the answer.

c. According to the BIC criterion, the model with the lowest BIC is the best.

So x1   x2   x3 with BIC = -321.7335 is the answer.

d. Yes, all variables (x1,x2,x3) are present in the model in answer a.

Sometimes, it may happen that the most important variable may be less significant than a combination of 2 variables with a correlation between them, which helps to explain a higher proportion of variation in Y.

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses
Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site