The dataset NavyLaborHourscsv reflects information from 17 U
The dataset NavyLaborHours.csv reflects information from 17 U.S. Navy hospitals at various sites around the world. The predictors are workload variables, that is, items that result in the need for personnel in a hospital. A brief description of the variables is as follows:
y = monthly labor hours
x.1 = average daily patient load
x.2 = monthly X - ray exposures
x.3 = eligible population in the area per 1000
x.4 = average length of patient\'s stay in days
a) (3 points) Provide and describe a scatterplot matrix. Are there any relationships between the response variable and the predictors? Are there any visible relationships between the predictors?
b) (2 points) Fit a least squares regression model using all variables. From the Model Utility F test is there evidence that at least one variable is a significant predictor of monthly labor hours at the US Navy Hospitals? State the F test statistic, degrees of freedom and p-value associated with the test. Assume a significance level of 0.05. Include R summary output.
c) (2 points) How well does the model fit the data? Give the R2 and adjusted R2 values? Which value makes more sense to report and why? Hint: Check out the R2 section on page 559-560 in the text.
d) (2 points) Give the estimated least squares regression equation. Please round your estimates to two decimal places.
e) (1 points) Provide a labor hours prediction for when: x.1= average daily patient load = 94.39 x.2 = monthly X-ray exposures = 8461 x.3 = eligible population in the area per 1000 = 78.7 x.4 = average length of patients’ stay in days = 6.18
f) (1 point) In part e) the observed labor hours for the above predictor values is 1243.90. How far off is the predicted value from the observed value? State the residual.
g) (2 points) Using the output from 2b), check the p-values from the individual t tests on the slopes. Which individual variables are significant at the 0.05 significance level?
h) (3 points) Fit a new model after removing the explanatory variable with the largest non-significant p-value (only one at a time). Continue this process until you have all significant predictors. Take notice of how the p-values change as you remove each variable. What is your new model? Provide the R output for the model summary. Hint: Should reduce to only one explanatory variable.
i) (3 points) Interpret the slope of your model, include a 95% confidence interval for the slope, and interpret.
j) (2 points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot.
k) (2 points) How well does the model fit the data? Give the R2 and adjusted R2 ? How do these compare to 2c?
l) (4 points) Provide a scatterplot with a title and the best fit line.
m) (1 point) Provide a monthly labor hours prediction for when (choose the variable that applies to your simple model from 3
n): x.1= average daily patient load = 94.39 x.2 = monthly X-ray exposures = 8461 x.3 = eligible population in the area per 1000 = 78.7 x.4 = average length of patients’ stay in days = 6.18 b) (1 point) In part a. the observed labor hours for the above predictor values is 1243.90. How far off is the predicted value from the observed value? State the residual?
o) (2 points) Calculate the confidence interval and prediction interval for the predicted monthly labor hours for the same value as in 4a. Helpful R code: predict(mod, data.frame(x.1=94.39, x.2 = 8461, x.3= 78.7, x.4=6.18), interval = \"confidence\", level = 0.95) predict(mod, data.frame(x.1 =94.39, x.2 = 8461, x.3= 78.7, x.4=6.18), interval = \"prediction\", level = 0.95) # all you need to change in the code above is the name of your simple model. Here the name is “mod”. # this will plot a confidence/prediction interval for the response given the value of x that applies
p) (4 points) Interpret both of the intervals from 4c. How do they differ numerically and conceptually?
956 2 864 556654 076 4 95877437759787446 367911 2 12652368 11333 2 1132 03s 1 5 2 2 222422 29 20 84 44 85 2 1 2224228892 2 2 123456789 678Solution
b) The Regression equation is
y=a+bx1+cx2+dx3+ex4
y = monthly labor hours
x.1 = average daily patient load
x.2 = monthly X - ray exposures
x.3 = eligible population in the area per 1000
x.4 = average length of patient\'s stay in days
From Excel
Hence The fitted regression is
y=79.09 +12.52 x1 +0.037 x2 -4.86 x3 +18.43 x4
And F cal=73.14 and p value=0.00000008
P < alpha hence the model is significant
C) Coefficient of determination R2=0.96
Adjusted R2=0.95
D)
The fitted regression is
y=79.09 +12.52 x1 +0.037 x2 -4.86 x3 +18.43 x4
E)
when: x.1= average daily patient load = 94.39 x.2 = monthly X-ray exposures = 8461 x.3 = eligible population in the area per 1000 = 78.7 x.4 = average length of patients’ stay in days = 6.18
The fitted regression is
y=79.09 +12.52 x1 +0.037 x2 -4.86 x3 +18.43 x4
y=79.09 +12.52*94.39 +0.037 *8461 -4.86*78.7 +18.43*6.18
y=1305.32
| SUMMARY OUTPUT | |||||
| Regression Statistics | |||||
| Multiple R | 0.980103 | ||||
| R Square | 0.960602 | ||||
| Adjusted R Square | 0.947469 | ||||
| Standard Error | 449.3775 | ||||
| Observations | 17 | ||||
| ANOVA | |||||
| df | SS | MS | F | Significance F | |
| Regression | 4 | 59083839.44 | 14770960 | 73.14523 | 2.53E-08 |
| Residual | 12 | 2423282.068 | 201940.2 | ||
| Total | 16 | 61507121.5 | |||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
| Intercept | 79.09251 | 484.2098757 | 0.163343 | 0.872967 | -975.91 |
| x1 | 12.52297 | 1.306324519 | 9.586415 | 5.64E-07 | 9.676733 |
| x2 | 0.037996 | 0.042753294 | 0.888726 | 0.391617 | -0.05516 |
| x3 | -4.86374 | 3.158630287 | -1.53983 | 0.149547 | -11.7458 |
| x4 | 18.43857 | 88.01621573 | 0.209491 | 0.83758 | -173.332 |


