Box Oces Based on data available on wwwboxocemojocom of the
Box Oces. Based on data available on www.boxocemojo.com, of the movies released in the United States between 2005 and 2014, 2,177 movies grossed over one million dollars. For this problem we will consider these 2,177 movies as the population of interest. Because of this assumption, the mean of all 2,177 movies will correspond to the population mean. The purpose of this problem is to illustrate that though we will not know the population mean µ in real-world situations, we can use the sample mean ¯ x and obtain an estimate fairly close to µ with high probability. We wish to estimate the parameter µ, the mean domestic gross box-oce sales (in millions of dollars). In Blackboard, there are three les that you will need for this problem, movies.jmp, clt30.jsl and clt100.jsl.
 First download all three les and save them to your desktop. (Note: if you open the les directly from the web page you will most likely receive an error message or the script le may not run when you start using it in part (f).)
 (a) (Free Response) Start JMP and open the data le movies.jmp. The data set contains the name of the movie, the domestic gross, and the year the movie was released. Create a histogram for the variable domestics gross. Describe the shape/pattern of the distribution. (You do not need to save or upload this histogram.) (b) Report the mean and standard deviation for domestic gross for all 2,177 movies. (Report to the nearest two decimal places.) (c) (T/F) The values in part (b) are sample statistics. (d) (Free Response) Before proceeding, make sure no rows or columns are highlighted in your data table. Now select a random sample of size n = 30 by clicking on Tables  Subset. Then select Random - sample size: and change it to 30. Click OK. Repeat this process two more times and record all three means. Additionally, describe how close your sample means are to the mean obtained in part (b). (e) (Free Response) Imagine we repeat part (d) 100 times and create a histogram of the observed sample means. Explain why the shape of this distribution ought to resemble that of a Normal Distribution. (f) (Free Response) Close any windows other than the original data table and open the le clt30.jsl. If the script does not run automatically, then use Edit  Run Script to run the script. This script randomly selects 100 samples of size n = 30 and calculates the sample means for each. All 100 means are stored in a new data table called ”Sample Summaries.” Plot the sample means in a histogram. (You do not need to save or upload this histogram.) Describe the overall shape of the distribution. Include the mean and standard deviation of all 100 means in your description. (Note that because the distribution of 100 sample means is an estimate of the sampling distribution, the mean that you reported is an estimate of the mean of ¯ x, as noted in the second bullet point of the CLT recap at the beginning of this assignment.) (g) (Free Response) Compare the three sample means that you found in part (d) and your estimate of µ¯ x from part (f) to the population mean from part (b). Your estimate from part (f) most likely will be closer to the population mean than each of the three individual sample means. Explain why this should be most likely the case, i.e. why this should happen for the vast majority of all students doing the assignment. (h) What is the value of the standard deviation of the sampling distribution? (Hint: consider the third bullet point of the CLT recap.)
 4
 (i) (Free Response) Compare the value that you obtained for the standard deviation in part (f) with the value you found in part (h). (Keep in mind that this value can only be approximately accurate, as we are dealing with a subset of random samples, and not all possible samples of size n = 30.) (j) (Free Response) Referring again to part (f), if your estimated sampling distribution looked somewhat skewed, explain why this might be. (Hint: Look at the population distribution and its main features. Next recall that we are selecting simple random samples. By denition, every sample of size n has the same chance of being selected. What does this imply about the chance that any features of the population distribution will get selected as part of a random sample of size n and therefore also be present in the sample?) (k) Close any open windows in JMP and do not save any changes to the original le movies.jmp. (If you continue any further analyses with any of the previously used/generated output windows open, you will most likely get an error message!) Reopen le movies.jmp in JMP. Now run the script clt100.jsl. It will create samples of size n = 100 (as opposed to n = 30) and compute the corresponding sample means. i. (Free Response) Create a histogram of the sample means. Upload your histogram to Blackboard as a .png, .jpg, or other image le. Do not upload a .jrp or you will receive no credit for this question. ii. (Free Response) Describe the overall shape of the distribution. Include the mean and standard deviation in your description. (l) (Free Response) Compare the population mean to the 100 sample means you found when n = 30 and n = 100 (i.e. when you ran clt30.jsl and clt100.jsl). Identify which group of sample means (n = 30 or n = 100)tends to be closer to the unknown population mean µ, and explain why we can expect this to be the case. (m) (Free Response) In a short paragraph (4 to 5 sentences), summarize what you have learned about estimating the unknown population mean µ using a random sample.
Solution

