LINEAR REGRESSION Use the sample to complete this section R
LINEAR REGRESSION – Use the sample to complete this section. Remember, the X variable is NUMBER OF RUNS SCORED, and the Y variable is NUMBER OF WINS.
Interpreting the regression output.
i. The regression equation is: ____________________________________
ii. Explain the exact meaning of the slope of the regression equation:
iii. Explain the exact meaning of the y-intercept of the regression equation:
iv. Explain the exact meaning of the standard error of the estimate:
v. Explain the exact meaning of the coefficient of determination:
vi. Predict the number of wins for a team that scores 670 runs (round off to the nearest integer). _______________
| Runs Scored (X) | Wins (Y) |
| 708 | 69 |
| 875 | 90 |
| 654 | 79 |
| 704 | 80 |
| 787 | 95 |
| 730 | 71 |
| 667 | 86 |
| 619 | 63 |
| 867 | 97 |
| 645 | 74 |
| 556 | 67 |
| 707 | 91 |
| 855 | 96 |
| 743 | 81 |
| 731 | 94 |
| 641 | 89 |
| 654 | 71 |
| 735 | 79 |
| 735 | 73 |
| 615 | 56 |
Solution
let the linear regression equation of Y on X be Y=a+bX where Y=number of wins ,X=number of runs scored, a=y intercept ,b=slope
a and b are estimated by method of least squares.
using method of least squares
b=r*sy/sx where r is the correlation coefficient between X and Y. sy is the standard deviation of Y. sx is the standard deviation of X.
and using method least squares a=ybar-b*xbar where ybar and xbar denote the mean of Y and X respectively.
i) hence the regression equation comes out to be
Y=10.9+0.0972*X [answer]
ii)we have Yn=a+bXn
then Yn+1=a+bXn+1 where (Xn,Yn) denote the nth pair of X and Y
so Yn+1-Yn=b(Xn+1-Xn)
or, b=(Yn+1-Yn)/(Xn+1-Xn)
hence the exact meaning of the slope is the unit increase of the value of Y per unit increase value of X.
here b=0.0972. which means Y increases by 0.0972 amount per unit increase value of X
iii) now we have Y=a+bX
when X=0 we have Y=a
so exact meaning of a means the value of Y when the value of X is zero.
here a=10.9 which means that when X=0 the value of Y is 10.9 [answer]
iv) here the estimated value is Y. because we estimating Y on the basis of a regression equation.
now standard error of Y means the dispersion in the values of Y from its mean.
now this dispersion is due to the dispersion in original value of y and the dispersion of the error values, the error that arises in estimating y using the regression equation.
so the standard error of the estimate means the dispersion of the original values of y and the dispersion of the error values,the error that arises in estimating y using the regression equation.
v) coefficient of determination is R2=r2 where r is the correlation coefficient.
we know R2=V(Y)/V(y) where y is the original variable and Y is the predicted values of y on the basis of regression equation.
so R2 means the proportion of total variation explained by the linear regression equation of Y on X.
the higher the values of R2 we can say that the prediction is getting better.
here R2=0.489
means that only 48.9% of the variation of y can be explained by the regression equation.
vi) number of wins for a team that scores 670 runs is [by putting X=670 in the regression equation]
Y=10.9+0.0972*670=76.024=76 [nearest integer] [answer]

