Refer to the Baseball 2010 data which report information on

Refer to the Baseball 2010 data, which report information on the 30 Major League Baseball teams for the 2010 season. Let the number of games won be the dependent variable and the following variable be independent variables: team batting average, number of stolen bases, number of errors committed, team ERA, number of home runs, and whether the team plays in the American or the National League. Add a league code variable using 0 for the National League and 1 for the American League. *** The baseball data can be obtained from http://highered.mcgraw-hill.com/sites/0073521477/student_view0/index.html (Links to an external site.). Then go to Data sets. Download the Baseball file. Make sure you get the Excel file (.xls) or Minitab file (.MTW) ***

a. Use a statistical software package to determine the multiple regression equation. Discuss each of the variables. For example, are you surprised that the regression coefficient for ERA is negative? Is the number of wins affected by whether the team plays in the National or the American League?

b. Find the coefficient of determination for this set of independent variables.

c. Develop a correlation matrix. Which independent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicollinearity?

d. Conduct a global test on the set of independent variables. Interpret.

e. Conduct a test of hypothesis on each of the independent variables. Would you consider deleting any of the variables? If so, which ones?

f. Rerun the analysis until only significant net regression coefficients remain in the analysis. Identify these variables.

g. Develop a histogram of the residuals from the final regression equation developed in part (f). Is it reasonable to conclude that the normality assumption has been met?

h. Plot the residuals against the fitted values from the final regression equation developed in part (f). Plot the residuals on the vertical axis and the fitted values on the horizontal axis.

Solution

Predicting formula for X3 is

Estimate   
(Intercept) 39.70998
x2 0.07125
x4 -16.97616
x5 392.36356
x6 0.11451
x7 0.01892   
x8 -0.09910

where X4 and X8 has negative effect on X3 where as X5 has very high effect on X3

b)

Multiple R-squared: 0.8659, Adjusted R-squared: 0.831

c) correlation matrix of independent variable is

x2 x4 x5 x6 x7 x8
x2 1.0000000 0.14478159 0.22415352 0.11408183 0.2703244 -0.0155255
x4 0.1447816 1.00000000 0.05804993 0.08710859 -0.2034245 0.4799303
x5 0.2241535 0.05804993 1.00000000 0.31717914 -0.1762533 -0.1656905
x6 0.1140818 0.08710859 0.31717914 1.00000000 -0.3077394 -0.2793425
x7 0.2703244 -0.20342447 -0.17625331 -0.30773937 1.0000000 -0.1326606
x8 -0.0155255 0.47993032 -0.16569051 -0.27934248 -0.1326606 1.0000000

d) tests for independent variable

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.70998 25.66452 1.547 0.135448
x2 0.07125 1.85861 0.038 0.969749
x4 -16.97616 2.41704 -7.024 3.71e-07 ***
x5 392.36356 89.13989 4.402 0.000207 ***
x6 0.11451 0.02971 3.854 0.000807 ***
x7 0.01892 0.03180 0.595 0.557558
x8 -0.09910 0.06026 -1.645 0.113637
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.525 on 23 degrees of freedom
F-statistic: 24.76 on 6 and 23 DF, p-value: 5.973e-09

so as per p-value we have seen that x5 and x6 are only rejected under 5% level of significance

and all other independent has insignificant for the model

Refer to the Baseball 2010 data, which report information on the 30 Major League Baseball teams for the 2010 season. Let the number of games won be the dependen
Refer to the Baseball 2010 data, which report information on the 30 Major League Baseball teams for the 2010 season. Let the number of games won be the dependen

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site