The package bayesm includes the dataset Scotch Need help wit

The package bayesm includes the dataset Scotch

Need help with R codes and visualizations for the questions below. The datasets are part of the installed packages in R and can be installed directly in R.

Q2: Whisky The package bayesm includes the dataset Scotch, which reports which brands of whisky 2218 respondents consumed in the previous year.

a) Draw a barchart of the number of respondents per brand. What ordering of the brands do you think is best?

b) There are 20 named brands and a further category Other.brands. That entails drawing a lot of bars. If you decided to plot only the biggest brands individually and group the rest all together in the ‘Other’ group, what cutoff would you use for defining a big brand?

c) Another version of the dataset called whiskey is given in the package flexmix. It is made up of two data frames, whiskey with the basic data, and whiskey_brands with information on whether the whiskeys are blends or single malts. How would you incorporate this information in your graphics, by using colour, by using a different ordering, or by drawing two graphics rather than one?

Solution

The olive oils dataset is well known and can be found in several packages, for instance as olives in extracat. The original source for the data is the paper [Forina et al., 1983].

a) Draw a scatterplot matrix of the eight continuous variables. Which of the fatty acids are strongly positively associated and which strongly negatively associated?

b) Are there outliers or other features worth mentioning?

The complete R snippet is

install.packages(\"psych\")

library(psych)

data.df<- (olives)

pairs(olives[,c(3:10)])

cor(olives[,c(3:10)])

cor(olives[,c(3:10)])

palmitic palmitoleic stearic oleic linoleic linolenic

palmitic 1.0000000 0.83560497 -0.17039178 -0.8373354 0.46068446 0.31932669

palmitoleic 0.8356050 1.00000000 -0.22218545 -0.8524384 0.62162666 0.09311163

stearic -0.1703918 -0.22218545 1.00000000 0.1135987 -0.19781693 0.01891719

oleic -0.8373354 -0.85243835 0.11359873 1.0000000 -0.85031837 -0.21817123

linoleic 0.4606845 0.62162666 -0.19781693 -0.8503184 1.00000000 -0.05743858

linolenic 0.3193267 0.09311163 0.01891719 -0.2181712 -0.05743858 1.00000000

arachidic 0.2282991 0.08548117 -0.04097892 -0.3199623 0.21097260 0.62023577

eicosenoic 0.5019518 0.41635048 0.14037748 -0.4241459 0.08904499 0.57831851

arachidic eicosenoic

palmitic 0.22829912 0.50195179

palmitoleic 0.08548117 0.41635048

stearic -0.04097892 0.14037748

oleic -0.31996234 -0.42414586

linoleic 0.21097260 0.08904499

linolenic 0.62023577 0.57831851

arachidic 1.00000000 0.32866349

eicosenoic 0.32866349 1.00000000

The one with the highest values (either is positive or negative direction) are considered to be having high correlation

The outlier analysis can be performed using the following code

library(outliers)

outs <- scores(olives$palmitic, type=\"chisq\", prob=0.9) # beyond 90th %ile based on chi-sq

olives$palmitic[outs]

The results are

> olives$palmitic[outs]

[1] 911 911 875 943 952 1529 1510 1539 1527 1518 1514 1620 1543 1721 1742 1517

[17] 1577 1590 1621 1753 1679 1693 1692 1638 1680 926 916 905 610 920 952 922

[33] 1732 1515 1521 1639

this is done foe palmitic , likewise we can perform the similar analysis for all the numeric variables

Please note that we can answer only 1 full question at a time , as per the answering guidelines

The package bayesm includes the dataset Scotch Need help with R codes and visualizations for the questions below. The datasets are part of the installed package
The package bayesm includes the dataset Scotch Need help with R codes and visualizations for the questions below. The datasets are part of the installed package

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site