Data was collected on 54 observations on a response of inter

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses of the data is shown below.

a) For the best subsets regression analysis, which is the best simple linear regression model for predicting y? Briefly explain your criteria for choosing this model

b) For all of the models listed in the best subsets regression analysis, which model is best according to the MSE criterion?

c) For all of the models listed in the best subsets regression analysis, which model is best according to the BIC criterion?

d) Is the variable from your best simple linear regression model (from part a) included in the model with the lowest overall MSE (part b)? Briefly explain why it could happen that the best single variable is not in the best overall model.

e) Following the best subsets regression results, the sums of squares for regression and error (also called residual) are displayed for several models. Using the regression sums of squares information for the full model containing all four x variables, calculate i) the R² value for the full model, ii) The F statistic for the test of the H₀ : ₁ = ₂ = ₃ = ₄ = 0 and iii) the standard deviation of the residuals for the full model

f) Using the regression sums of squares information, test the null hypothesis H₀ : ₂ = ₄ = 0 for the full model. Calculate an F statistic, obtain a tabled F value and report the conclusion of your test, use = .05

g) Using the regression sum of squares information, for the model containing the terms x1, x2 and x4, calculate the t statistic for the hypothesis H₀ = ₁ = 0, where ₁ is the coefficient of x1. (Hint: first calculate an F statistic for H₀ and then take its square root to obtain the t value. Assume all regression coefficients are positive)

Results for the best subsets regression analysis

The REG Procedure Dependent variable: y

R – Square Selection method

Root

Number in Model R – Square C (p) MSE BIC Variables in Model

1 0.5259 757.7545 0.19035 -180.6717 x4

1 0.4414 901.7010 0.20662 -171.8944 x3

1 0.3447 1066.354 0.22378 -163.3427 x2

1 0.1257 1439.527 0.25849 -147.8645 x1

2 0.8049 284.3219 0.12329 -227.5857 x2 x3

2 0.6847 489.1148 0.15673 -202.2469 x3 x4

2 0.6521 544.6919 0.16464 -197.0205 x1 x3

2 0.6440 558.4141 0.16654 -195.8044 x2 x4

3 0.9712 3.0002 0.04781 -321.7335 x1 x2 x3

3 0.8758 165.5090 0.09935 -250.6043 x2 x3 x4

3 0.7215 428.5088 0.14878 -208.5846 x1 x3 x4

3 0.6449 558.9234 0.16799 -195.7567 x1 x2 x4

4 0.9712 5.0000 0.04830 -319.5296 x1 x2 x3 x4

Results from several models to predict y using various combinations of x variables.

Terms in model SSRegression SSE or SSResidual

x2 1.36979 2.60397

x3 1.75385 2.21991

x1, x3 2.59127 1.38250

x2, x4 2.55926 1.41450

x1, x2, x3 3.85947 0.11430

x1, x2, x4 2.56274 1.41103

x1, x2, x3, x4 3.85947 0.11430

sol)

Each line of the output represents a different model. Vars indicates the number of predictors in the model. Predictors that are present in the model are indicated by an X.out put displays the two best models for each number of predictors. A good model should have a high R² and adjusted R², small S, and a Mallows\' Cp close to the number of predictors in the model and the constant. Using the adjusted R² is recommended over R² for comparing models with different numbers of terms.

Hence best subsets regression analysis are x1 x2 x3 and x1 x2 x3 x4 .

b) Best according to the MSE criterion is x1 x2 x3 since MSE is less.

Models with smaller Schwarz Bayesian Criterion (SBC) or BIC are estimated to predict better. SBC is also known as Bayesian Information Criterion:

From above output

x1 x2 x3 model having smallest BIC value hence this is best regression model.

From above analysis the best model variables are x1 x2 x3 x4 . Since it has higher R square value with smaller MSE.

Suppose if we take only single variable say X_1, its R square value is 0.1257. Hence it is very low. If we add the variables the R sqaure value will be incresed.

Data was collected on 54 observations on a response of interest, y and four potential predictor variables x1, x2, x3 and x4. The output from regression analyses

Data was collected on 54 observations on a response of inter

Solution

Get Help Now

Submit a Take Down Notice