i want solving this problem i data mining but i want solvin
i want solving this problem i data mining ? but i want solving clear please ?
Use the data points and cluster into 3 groups with K-means algorithm. Discover frequent patterns using the transactions table. calculate support and confidence values with an algorithm whatever you want. (The min support is 0.4 and the min confidence is 0.6)Solution
DATA MINING
Data mining or important part of Knowledge Discovery in Database (KDD), used to discover the most important information throughout the data, is a powerful new technology. Across a myriad variety of fields, data are being collected and of course, there is an urgent need to computational technology which is able to handle the challenges posed by these new types of data sets.
What is Cluster Analysis?
Cluster Analysis technique as a field grew very quickly with the goal of grouping data objects, based on information found in data and describing the relationships inside the data. The purpose is to separate the objects into groups, with the objects related (similar) together and unrelated with another group of objects.
The Basic K-means Algorithm
The k-means clustering technique is one of the simplest algorithms:
Description of the basic algorithm:
We assume we have some data point, D=(X1… Xn), first choose from this data points, K initial centroid, where k is user-parameter, the number of clusters desired.
Each point is then assigned to nearest centroid. For many, we want to identify group of data points and assign each point to one group.
The idea is to choose random cluster centers, one for each cluster. The centroid of each cluster is then updated based on means of each group which assign as a new centroid.
We repeat assignment and updated centroid until no point changes, means no point don’t navigate from each cluster to another or equivalently, each centroid remain the same.
Basic K-Mean Clustering
1: Choose k points as initial centroid
2: Repeat
3: Assign each point to the closest cluster center,
4: Recomputed the cluster centers of each cluster,
Steps to cluster the data points using K mean algorithm.
The data set consisting of the values of x,y,z variables
x
y
z
1
6
14
10
2
2
10
16
3
0
8
18
4
20
0
13
5
19
1
0
6
18
3
17
7
14
20
8
8
15
9
9
9
13
12
19
10
7
15
15
This data set is to be grouped into three clusters.
As a first step in finding a sensible initial partition, using the Euclidean distance measure), define the initial cluster means.
Euclidean distance measure :
Square root of (x2-x1)2+ (y2-y1)2+(z2-z1)2
Example: (6,14,10) & (2,10,16)
Square root of (2-6)2+(10-14)2+(16-10)2 = 16+16+36 =Square root of 68
dataset
Mean Vector (centroid)
Group 1
1
(6,14,10)
Group 2
5
(19,1,0)
Group 3
10
(7,15,15)
The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a new member is added. This leads to the following series of steps:
Cluster 1
Cluster 2
Cluster 3
Step
Data Set
Mean Vector (centroid)
Data Set
Mean Vector (centroid)
Data Set
Mean Vector (centroid)
1
1
(6,14,10)
5
(19,1,0)
8
(0,7,15)
2
1, 2
(6,12,16)
5
(19,1,0)
8
(0,7,15)
3
1, 2, 3
(6,13,18)
6
(0,15,19)
8,9
(0,9,19)
4
1, 2, 3
(6,13,18)
5,6
(0.10,19)
8,9
(0,9,19)
5
1, 2, 3
(6,13,18)
5,6,7
(0.10,20)
8,9,10
(0,7,19)
6
1, 2, 3,4
(6,13,18)
5,6,7
(0.10,20)
8,9,10
(0,7,19)
The initial partition has changed, and the two clusters at this stage having the following characteristics:
Data Set
Mean Vector (centroid)
Cluster 1
1, 2, 3,4
(6,13,18)
Cluster 2
5,6,7
(0.10,20)
Cluster 3
8,9,10
(0,7,19)
These are some steps need to follow during the clustering.
| x | y | z | |
| 1 | 6 | 14 | 10 | 
| 2 | 2 | 10 | 16 | 
| 3 | 0 | 8 | 18 | 
| 4 | 20 | 0 | 13 | 
| 5 | 19 | 1 | 0 | 
| 6 | 18 | 3 | 17 | 
| 7 | 14 | 20 | 8 | 
| 8 | 15 | 9 | 9 | 
| 9 | 13 | 12 | 19 | 
| 10 | 7 | 15 | 15 | 





