aThe R commands xrcauchy500 summaryx generate a random sampl

a)The R commands x=rcauchy(500); summary(x) generate a random sample of size
500 from the Cauchy distribution and display the sample’s five number summary; Report the
five number summary and the interquartile range, and comment on whether or not the smallest
and largest numbers generated from this sample of 500 are outliers. Repeat this 10 times.

b)The R commands m=matrix(rcauchy(50000), nrow=500); xb=apply(m,1,mean);summary(xb) generate the matrix m that has 500 rows, each of which is a sample of size n=100 from the Cauchy distribution, compute the 500 sample means and store them in xb. and display the five number summary xb. Repeat these commands 10 times, and report the 10 sets of five number summaries. Compare with the 10 sets of five number summaries from part (a), and comment on whether or not the distribution of the averages seems to be more prone to extreme outliers as that of the individual observations.

c) (5 points) Why does this happen? (hint: try to calculate E(X) and V(X) for this distribution) and does the LLN and CLT apply for samples from a Cauchy distribution?

Solution

a)

Use the following commands -

> x = rcauchy(500); summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1135.0000 -0.9987 0.0261 -5.6470 1.1010 132.2000

Now to find the interval to determine the outliers, follow these steps -

> lower_q = quantile(x)[2]
> upper_q = quantile(x)[4]
> IQR = upper_q - lower_q
> lower_outlier = lower_q - (1.5*IQR)
> upper_outlier = upper_q + (1.5*IQR)

And now print the critical values as -

> print (lower_outlier)
25%
-4.147591
> print (upper_outlier)
75%
4.249579

As the maximum and minimum values are out of the interval, so there are outliers.

--------------------------------------------------------------

Use the following code to simulate this 10 times -

> x = rcauchy(500)
> lower_q = quantile(x)[2]
> upper_q = quantile(x)[4]
> IQR = upper_q - lower_q
> lower_outlier = lower_q - (1.5*IQR)
> upper_outlier = upper_q + (1.5*IQR)

Now, print the summary and the critical range of outliers as -

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-387.5000 -0.7837 0.0942 -0.2992 1.0180 176.0000

> lower_outlier
25%
-3.486488
> upper_outlier
75%
3.720933

--------------------------------------------------------------------

> x = rcauchy(500)
> lower_q = quantile(x)[2]
> upper_q = quantile(x)[4]
> IQR = upper_q - lower_q
> lower_outlier = lower_q - (1.5*IQR)
> upper_outlier = upper_q + (1.5*IQR)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-380.3000 -1.1290 0.0197 0.3132 0.9947 434.3000
> lower_outlier
25%
-4.31406
> upper_outlier
75%
4.180033

---------------------------------------------

I am just writing the results of the next 7 iterations as the code remains same.

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-32.78000 -0.82270 -0.06634 0.39880 0.88390 237.70000
> lower_outlier
25%
-3.382684
> upper_outlier
75%
3.443926

-----------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-92.5800 -0.8865 0.0540 2.6860 1.1100 1343.0000
> lower_outlier
25%
-3.881676
> upper_outlier
75%
4.105465

------------------------------------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-120.80000 -0.85490 0.01558 0.60650 1.02700 261.00000
> lower_outlier
25%
-3.677512
> upper_outlier
75%
3.849527

--------------------------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-143.8000 -0.9392 0.0107 0.0674 0.8691 365.8000
> lower_outlier
25%
-3.6517
> upper_outlier
75%
3.581553

-------------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-151.8000 -1.0200 0.0114 1.1640 1.0030 330.6000
> lower_outlier
25%
-4.055198
> upper_outlier
75%
4.037374

---------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1583.0000 -1.0750 -0.0281 -1.9530 0.9142 1230.0000
> lower_outlier
25%
-4.059831
> upper_outlier
75%
3.898543

---------------------------------------------------

> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-116.80000 -1.10000 -0.07979 0.00487 1.06200 69.92000
> lower_outlier
25%
-4.343726
> upper_outlier
75%
4.305393

---------------------------------------

So we can see that each iteration has outliers.

______________________________________________________________________

b)

We get the following output from the command -

> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-809.6000 -0.8745 -0.0071 -1.0010 0.9474 562.9000

Following are the 9 iterations of the same command -

> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-943.4000 -0.9225 0.0852 -0.7170 1.0590 189.0000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-567.1000 -0.8232 0.0170 -0.2387 1.1530 265.5000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1275.0000 -0.8804 0.0864 -1.0730 1.1590 289.3000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-61.5400 -1.1300 0.0266 1.1680 1.1910 392.7000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-603.2000 -1.1230 -0.1073 -2.0570 0.7257 68.4300
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-380.6000 -0.9142 0.1299 -1.0370 1.1040 147.6000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-129.300 -0.918 -0.046 8.521 1.099 3896.000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-163.50000 -1.07100 0.02538 0.96250 0.90580 130.70000
> m=matrix(rcauchy(50000),nrow=500);
> xb=apply(m,1,mean);summary(xb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-172.60000 -1.13100 -0.01897 -0.22130 1.01500 69.69000

----------------------------------------------------------

We can see that the distribution of the average in both cases are similar as the Interquartile Range doesn\'t get much affected.

So, in both cases the distribution of averages remains same.

_____________________________________________________

c)

This happens because CLT is not valid for cauchy\'s distribution.

a)The R commands x=rcauchy(500); summary(x) generate a random sample of size 500 from the Cauchy distribution and display the sample’s five number summary; Repo
a)The R commands x=rcauchy(500); summary(x) generate a random sample of size 500 from the Cauchy distribution and display the sample’s five number summary; Repo
a)The R commands x=rcauchy(500); summary(x) generate a random sample of size 500 from the Cauchy distribution and display the sample’s five number summary; Repo
a)The R commands x=rcauchy(500); summary(x) generate a random sample of size 500 from the Cauchy distribution and display the sample’s five number summary; Repo

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site