List any two of the similarities and dissimilarities between
List any two of the similarities and dissimilarities between the three most common standard processes involved in Data mining process.
2. Analyze and discuss the various data mining techniques . Specify from your perspective, which technique do you prefer among these ? Why?
3. What is an ANN? Describe various types of ANN. Which ANN do you prefer amidst of the variety of ANNs? Justify the reaso n beyond this.
4. Pick out any text - mining tool that impressed you the most and discuss an overview of its text mining process, sentimental & speech analytics.
5. Bring out any two of the comparisons and variations among Web analytics and Social analytic
Solution
Data Mining:
Data mining is a process of analysing data from various perspectives and summarizing it in useful information.
The data mining sometimes called as knowledge discovery.
The data mining has most common processes there are as follows:
CRISP: Cross-Industry Standard Process for Data mining.
SEMMA: Sample Explore Modify Model Assess.
KDD: Knowledge Discovery in Databases.
Similarities between most common standard processes involved in data mining process are as follows:
The Sample and Explore stages of SEMMA roughly correspond with the Data Understanding phase of CRISP-DM and similar to pre-processing stage in KDD.
By comparison of the KDD and SEMMA stages as follows:
Sample can be identified with Selection,
Explore can be identified with Pre processing
Modify can be identified with Transformation
Model can be identified with Data Mining
Assess can be identified with Interpretation/Evaluation.
Dissimilarities between most common standard processes involved in data mining process are as follows:
The Knowledge Discovery Databases (KDD) model is an iterative and interactive model, it has total nine steps. It refers to finding knowledge in data and emphasizes the high level of specific data mining method.
Cross-Industry Standard Process for Data Mining (CRISP-DM) was launched in late 1996 by Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) and NCR. This models the refines over the years. It has six steps or phases.
Sample, Explore, Modify, Model, Assess (SEMMA) [5] model was developed by SAS institute. It has five different phases.
2) Data mining techniques:
There are eight techniques in data mining
In the above techniques for data mining I prefer clustering, why because it is process of partitioning a set of data (or objects) into a set of meaningful sub-classes, called clusters. Help users understand the natural grouping or structure in a data set.
3)
Artificial Neuron Network:
An artificial neuron network (ANN) is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes - or learns, in a sense - based on that input and output.
ANNs have three layers that are interconnected. The first layer consists of input neurons. Those neurons send data on to the second layer, which in turn sends the output neurons to the third layer.
Types of artificial neural networks:
There are two Artificial Neural Network topologies FreeForward and Feedback.
Feedforward ANN:
The information flow is unidirectional. A unit sends information to other unit from which it does not receive any information. There are no feedback loops. They are used in pattern generation/recognition/classification. They have fixed inputs and outputs.
Feedback ANN:
Here, feedback loops are allowed. They are used in content addressable memories.
Feed forward is mostly used ANN Network due to its different applications:
a.Physiological feed-forward system:
In physiology, feed-forward control is exemplified by the normal anticipatory regulation of heartbeat in advance of actual physical exertion
b.Feed-forward systems:
In computing: In computing, feed-forward normally refers to a perceptron network in which the outputs from all neurons go to following but not preceding layers, so there are no feedbacks loops. The connections are set up during a training phase, which in effect is when the system is a feedback system.
c.Automation and Machine Control:
Feedforward control is a discipline within the field of automatic controls used in automation.
4)
Speech analytics:
Speech analytics allows users to analyse and extract information from both live and recorded conversations. It is being used effectively to gather intelligence for security purposes, to enhance the presentation and utility of rich media applications, and perhaps most significantly, to deliver quantitative business intelligence through the analysis of the millions of recorded calls.
Sentiment analysis:
Sentiment analysis is a branch of speech analytics that focuses specifically on assessing. The emotional states display Edina conversation. One common use of sentiment analysis within contact centres is to provide insight into a customer’s feelings about an organisation, its products, services, customer service processes, as well as its individual agent behaviours.
5)
Comparisons and variations among web analytics and social analytics as follows:
The web analytics give the information about traffic levels, user behaviour, and referral sources on the website
Social analytics gathers information from social networking sites and helps to understand users’ attitudes, building effective consumer profiles and strategies.
Social media listening is the process of aggregating and assessing what is being said about a company, individual, product or brand on the internet.


