several variables on total winnings of 100 randomly selected

several variables on total winnings of 100 randomly selected PGA golfers in 2004. The are: Variable Age AvgDriveYds Description Player\'s age in years Average length of drive in yards % of drives landing in the fairway % of greens reached in regulation (ie, par-2) Average number of putts taken per round % of pars saved when off the green in regulation Number of golf tournaments played GreensReg AvgNumPutts SavePct NumEvents The dependent variable is: Dependent Var Description TotWinnings1000s Total winnings in thousands of $ results A multiple linear regression model was created using the above variables. Here are the main Regression Analysis R 0.435 Adjusted R2 0.392 R 0.659 n 100 Std. Eror 1058.283 Dep. Var. TotWinningst000s ANOVA table Source Regression Residual Total 79,221,681 103,036,680 182,258,361 11,317,383 10.11 255E-09 1.119.964 92 Regression output t (d-92) p-value 1.625 1075 VIF vanables Intercept Age AvgDriveYds DriveAcc std error 18,021.42 11,087.17 19.34 21.52 33.73 49.95 -13,745.68 4,816.89 21.63 24.25 350 7272 1297 -1.051 2962 2774 2709 0081 2.736 5813 8.78E-08 1775 2.8540053 1.198 1.481 14211201 1069 1.722 6.77 22.61 91.37 290.32 SavePct NumEvents 32.02 1.000 3199 mean VF a) (6 pts) Write down the fitted regression model. Only include variables that are statistically significant

Solution

a) At alpha = 0.05 the variables that are significant are the ones that have p value < 0.05

DriveAcc ( p value = 0.0081) GreensReg (p value = 8.78 * 10-8 ) AvgNumPutts (p value = 0.0053)

Tot Winnings = 18,021.42 - 91.37 DriveAcc + 290.32 GreensReg - 13,745.68 AvgNumPutts

b) Tot Winnings = 18,021.42 - 91.37 * 64.1 + 290.32 * 65.1 - 13,745.68 * 1.749

= 7.023 ($000)

c) alpha = 0.05

y predicted = 7.023

Std Error = 1058.283

n = 100; df = n - (k+1) = 100 - ( 7 + 1 ) = 92

t crit = 1.986

Prediction interval = y predicted + t crit * std error

= 7.023 (000) + 1.986 * 1058.283

= (4921.5 , 9124.9)

d) SavPct

Std Error = 21.63

t crit for alpha = 0.05 and df = (100 - 1) = 99 is + 1.98

coeff = 32.03

Confidence interval = 32.03 + 1.98 * 21.63

= (-10.8 , 74.9)

Since the confidence interval contains 0 we can say it is not significant

e) True - Since p value is small it means that the probability of at least one variable being significant is less than alpha

f) R2 explains the % of variance accounted by the independent variable on the dependent variable

Adjusted R2 has been adjusted for the number of predictors in the model.

If one variable is added R2 will not decrease. It will always increase but adjusted R2 increases only if the new term improves the model more than would be expected by chance.

g) 43.5% Since R2 = 0.435

h) Multicollinearity means that independent variables are correlated and hence it will lead to an inflated R2 value which is incorrect

No it is not a problem since model is significant as p value < 0.05

i) Using residual plots, you can assess whether the observed error (residuals) is consistent with stochastic error.

For this model it is consistent

 several variables on total winnings of 100 randomly selected PGA golfers in 2004. The are: Variable Age AvgDriveYds Description Player\'s age in years Average
 several variables on total winnings of 100 randomly selected PGA golfers in 2004. The are: Variable Age AvgDriveYds Description Player\'s age in years Average

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site