Q1 Give two examples apart from those given in the slides fo
Q1. Give two examples, apart from those given in the slides, for each of the following:
a) Data mining from the commercial viewpoint
b) Data mining from the scientific viewpoint
Q2. Differentiate between classification of data and clustering of data with the help of suitable examples.
Q3. Why do we need preprocessing of the data? Explain any 4 data preprocessing techniques.
Q4. Differentiate between correlation analysis and covariance analysis of data with the help of a suitable example not given in the slides.
Solution
Q1. a) Example for data mining from commercial point:
b) Example for scientific viewpoint
Q2. Classification of the groups (or classes) are specified before hand, with each training data instance belonging to a particular class. Association between the instances features and the class they belong to that classification algorithms are supposed to learn.
Example: an insurance company trying to assign customers into high risk and low risk categories.
Clustering the groups are based on the similarities of data instances to each other. No predefined output class is used in training and the clustering algorithm is supposed to learn the grouping.
Example: online movie company recommending a person a movie because other customers who had made similar movie choices as the person in the past have favorably rated that movie.
Q3.Any sort of processes performed on a raw data is called data processing. It is a very important task as the data give may contain unnecessary informations which maynot be required for the job.The 4 preprocessing techniques are:
Q4)Correlation analysis is a method of statistical evaluation .It is used to study the strength of a relationship between two, numerically measured, continuous variables
E.g. height and weight.
Covariance analysis is a combination of analysis of variance which is called ANOVA and the linear regression that accounts for intergroup variance when performing ANOVA.
