Data was collected on 54 observations on a response of inter

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses of the data is shown below.

a) For the best subsets regression analysis, which is the best simple linear regression model for predicting y? Briefly explain your criteria for choosing this model

b) For all of the models listed in the best subsets regression analysis, which model is best according to the MSE criterion?

c) For all of the models listed in the best subsets regression analysis, which model is best according to the BIC criterion?

d) Is the variable from your best simple linear regression model (from part a) included in the model with the lowest overall MSE (part b)? Briefly explain why it could happen that the best single variable is not in the best overall model.

e) Following the best subsets regression results, the sums of squares for regression and error (also called residual) are displayed for several models. Using the regression sums of squares information for the full model containing all four x variables, calculate i) the R2 value for the full model, ii) The F statistic for the test of the H0 : 1 = 2 = 3 = 4 = 0 and iii) the standard deviation of the residuals for the full model

f) Using the regression sums of squares information, test the null hypothesis H0 : 2 = 4 = 0 for the full model. Calculate an F statistic, obtain a tabled F value and report the conclusion of your test, use = .05

g) Using the regression sum of squares information, for the model containing the terms x1, x2 and x4, calculate the t statistic for the hypothesis H0 = 1 = 0, where 1 is the coefficient of x1. (Hint: first calculate an F statistic for H0 and then take its square root to obtain the t value. Assume all regression coefficients are positive)

Results for the best subsets regression analysis

The REG Procedure                                              Dependent variable: y

R – Square Selection method

Root

Number in Model R – Square C (p)           MSE               BIC           Variables in Model

         1 0.5259          757.7545      0.19035       -180.6717             x4

         1 0.4414          901.7010      0.20662       -171.8944             x3

         1 0.3447          1066.354      0.22378       -163.3427             x2

1 0.1257          1439.527      0.25849       -147.8645             x1

         2 0.8049          284.3219      0.12329       -227.5857         x2    x3

         2 0.6847          489.1148      0.15673       -202.2469         x3    x4

         2 0.6521          544.6919      0.16464       -197.0205         x1    x3   

2 0.6440          558.4141      0.16654       -195.8044         x2    x4

3 0.9712            3.0002        0.04781      -321.7335        x1   x2   x3

3 0.8758          165.5090      0.09935       -250.6043       x2   x3   x4

3 0.7215          428.5088      0.14878       -208.5846         x1   x3   x4

3 0.6449          558.9234      0.16799       -195.7567         x1   x2   x4

4 0.9712              5.0000       0.04830       -319.5296        x1   x2   x3   x4

Results from several models to predict y using various combinations of x variables.

Terms in model SSRegression                   SSE or SSResidual

          x2 1.36979                                  2.60397

          x3 1.75385                                   2.21991

       x1, x3 2.59127                                   1.38250

       x2, x4 2.55926                                  1.41450

   x1, x2, x3 3.85947                                   0.11430

   x1, x2, x4 2.56274                                   1.41103

x1, x2, x3, x4 3.85947                                   0.11430

Solution

sol)

A)

Each line of the output represents a different model. Vars indicates the number of predictors in the model. Predictors that are present in the model are indicated by an X.out put displays the two best models for each number of predictors. A good model should have a high R2 and adjusted R2, small S, and a Mallows\' Cp close to the number of predictors in the model and the constant. Using the adjusted R2 is recommended over R2 for comparing models with different numbers of terms.

Hence best subsets regression analysis are  x1   x2   x3 and  x1   x2   x3  x4 .

b) Best according to the MSE criterion is x1   x2   x3 since MSE is less.

c)

Models with smaller Schwarz Bayesian Criterion (SBC) or BIC are estimated to predict better. SBC is also known as Bayesian Information Criterion:

From above output

x1   x2   x3 model having smallest BIC value hence this is best regression model.

d)

From above analysis the best model variables are x1   x2   x3  x4 . Since it has higher R square value with smaller MSE.

Suppose if we take only single variable say X1, its R square value is 0.1257. Hence it is very low. If we add the variables the R sqaure value will be incresed.

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses
Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site