Data was collected on 54 observations on a response of inter

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses of the data is shown below.

a) For the best subsets regression analysis, which is the best simple linear regression model for predicting y? Briefly explain your criteria for choosing this model

b) For all of the models listed in the best subsets regression analysis, which model is best according to the MSE criterion?

c) For all of the models listed in the best subsets regression analysis, which model is best according to the BIC criterion?

d) Is the variable from your best simple linear regression model (from part a) included in the model with the lowest overall MSE (part b)? Briefly explain why it could happen that the best single variable is not in the best overall model.

e) Following the best subsets regression results, the sums of squares for regression and error (also called residual) are displayed for several models. Using the regression sums of squares information for the full model containing all four x variables, calculate i) the R² value for the full model, ii) The F statistic for the test of the H₀ : ₁ = ₂ = ₃ = ₄ = 0 and iii) the standard deviation of the residuals for the full model

f) Using the regression sums of squares information, test the null hypothesis H₀ : ₂ = ₄ = 0 for the full model. Calculate an F statistic, obtain a tabled F value and report the conclusion of your test, use = .05

g) Using the regression sum of squares information, for the model containing the terms x1, x2 and x4, calculate the t statistic for the hypothesis H₀ = ₁ = 0, where ₁ is the coefficient of x1. (Hint: first calculate an F statistic for H₀ and then take its square root to obtain the t value. Assume all regression coefficients are positive

Results for the best subsets regression analysis

The REG Procedure Dependent variable: y

R – Square Selection method

Number in Model R – Square C (p) Root MSE BIC Variables in Model

1 0.5259 757.7545 0.19035 -180.6717 x4

1 0.4414 901.7010 0.20662 -171.8944 x3

1 0.3447 1066.354 0.22378 -163.3427 x2

1 0,1257 1439.527 0.25849 -147.8645 x1

2 0.8049 284.3219 0.12329 -227.5857 x2 x3

2 0.6847 489.1148 0.15673 -202.2469 x3 x4

2 0.6521 544.6919 0.16464 -197.0205 x1 x3

2 0.6440 558.4141 0.16654 -195.8044 x2 x4

3 0.9712 3.0002 0.04781 -321.7335 x1 x2 x3

3 0.8758 165.5090 0.09935 -250.6043 x2 x3 x4

3 0.7215 428.5088 0.14878 -208.5846 x1 x3 x4

3 0.6449 558.9234 0.16799 -195.7567 x1 x2 x4

4 0.9712 5.0000 0.04830 -319.5296 x1 x2 x3 x4

Results from several models to predict y using various combinations of x variables.

Terms in model SSRegression SSE or SSResidual

x2 1.36979 2.60397

x3 1.75385 2.21991

x1, x3 2.59127 1.38250

x2, x4 2.55926 1.41450

x1, x2, x3 3.85947 0.11430

x1, x2, x4 2.56274 1.41103

x1, x2, x3, x4 3.85947 0.11430

a. For the best subsets regression analysis, the best simple linear regression model for predicting y is the one with the highest value of R-squared. This is because R-Square represents the proportion of variation in y explained by variation in x.

So x1 x2 x3 with R-squared = 0.9712 is the answer.

b. We always choose the model with lowest error, that is MSE.

So x1 x2 x3 with MSE = 0.04781 is the answer.

c. According to the BIC criterion, the model with the lowest BIC is the best.

So x1 x2 x3 with BIC = -321.7335 is the answer.

d. Yes, all variables (x1,x2,x3) are present in the model in answer a.

Sometimes, it may happen that the most important variable may be less significant than a combination of 2 variables with a correlation between them, which helps to explain a higher proportion of variation in Y.

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses

Data was collected on 54 observations on a response of inter

Solution

Get Help Now

Submit a Take Down Notice