Suppose we measure two variables Xx and X2 for four items A

Suppose we measure two variables Xx and X2 for four items A, B, C and D. The data are as follows: Use the K-means clustering technique to divide the items into K = 2 clusters. Start with the initial groups (AC) and (BD). (Find the distance using Manhattan distance). Show all the steps. Use the procedures and tables as discussed in the class.

Solution

The given data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the x1 and x2 values   of the two individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:

Individual

Mean Vector ( Centroid)

Group 1

A

( 5,4)

Group2

B

(1,-2)

The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a new member is added. This leads to the following series of steps:

Cluster 1

Cluster 2

Step

Individual

Vector Mean

(Centroid)

Individual

Vector Mean

(Centroid)

1

A

(5,4)

B

(1,-2)

2

A, C

(2, 2.5)

B,D

( 2, -0.5)

Now the initial partition has changed, and the two clusters at this stage having the following characteristics:

Individual

Mean Vector ( Centroid)

Cluster 1

A,C

( 2, 2.5)

Cluster 2

B,D

(2, - 0.5)

However, we cannot be sure that the various individuals have been assigned to the right cluster.Therefore, we compare each individual’s distance to its own cluster mean and also to the other cluster’s mean. We find that:

Individual

Distance to mean (centroid) of Cluster 1

Distance to mean (centroid) of Cluster 2

A

3.3

5.4

B

4.6

1.8

C

3.3

3.3

D

1.8

1.8

The individuals C and D are equidistant from the centroid of both the clusters. Thus C and D may be classified into either of the 2 clusters.

Manhattan distance is the distance between two points measured along axes at right angles. In a plane with p1 at (a, b) and p2 at (c, d), it is |a - c| + |b - d|.

The Manhattan distance between A and B is |5 - 1| + |4 – ( -2)|.= 4 + 6 = 10

The Manhattan distance between A and C is |5 –(-1)| + |4 - 1| = 6 + 3 = 9

The Manhattan distance between A and D is |5 - 3| + |4 - 1| = 2 + 3 = 5

Individual

Mean Vector ( Centroid)

Group 1

A

( 5,4)

Group2

B

(1,-2)

 Suppose we measure two variables Xx and X2 for four items A, B, C and D. The data are as follows: Use the K-means clustering technique to divide the items into
 Suppose we measure two variables Xx and X2 for four items A, B, C and D. The data are as follows: Use the K-means clustering technique to divide the items into
 Suppose we measure two variables Xx and X2 for four items A, B, C and D. The data are as follows: Use the K-means clustering technique to divide the items into

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site