Produce a new version of kmeans called MRKmeansMultirun Kmea

Produce a new version of k-means called MR-K-means(\"Multi-run K-means\") in R, which has the following inputs:

• The number of clusters k (same as K-means)

• Random number \"meta seed\" s (has the same type as the k-means seed)

• A cluster evaluation measure eval (1, 2, 3, or 4 referring to the four clustering evaluation measure introduced earlier)

MR-K-means runs K-means 10 times (with different seeds) obtaining clustering X1,...,X10 returning the clustering Xi with the highest value eval(Xi) as well as the maximum, average, minimum value and standard deviation of eval(X1),...,eval(X10).

The language is R.

Produce a new version of k-means called MR-K-means(\"Multi-run K-means\") in R, which has the following inputs:

• The number of clusters k (same as K-means)

• Random number \"meta seed\" s (has the same type as the k-means seed)

• A cluster evaluation measure eval (1, 2, 3, or 4 referring to the four clustering evaluation measure introduced earlier)

The language is R.

Produce a new version of k-means called MR-K-means(\"Multi-run K-means\") in R, which has the following inputs:

• The number of clusters k (same as K-means)

• Random number \"meta seed\" s (has the same type as the k-means seed)

• A cluster evaluation measure eval (1, 2, 3, or 4 referring to the four clustering evaluation measure introduced earlier)

The language is R.

• The number of clusters k (same as K-means)

• Random number \"meta seed\" s (has the same type as the k-means seed)

• A cluster evaluation measure eval (1, 2, 3, or 4 referring to the four clustering evaluation measure introduced earlier)

The language is R.

Project Objectives: In this project you will learn to use the clustering algorithms DBSCAN and K-Means and how to summarize and interpret clustering results. Moreover, you will implement some post processing functions, 4 cluster evaluation measures and functions that run and interpret experiments in R.

Cluster Evaluation Measures: In Project2 the following four cluster evaluation measures will be used and therefore have to be implemented:

Let

O be a dataset

X={C₁,…,C_k} be a clustering of Owith C_i ÍO (for i=1,…,k), C₁È…ÈC_k ÍOand C_iÇC_j=Æ (for i¹ j)

MSE(X)= (SoÎO (d(o,centroid(cluster(o,X)))**2)/|O|

with cluster(o,X) returning the cluster to which o belongs in X, centroid(C) returning the centroid[1] of cluster C, |O| denotes the number of objects in O, and d denotes Euclidian distance.

M_MSE(X)= 1/(MSE(X)+0.1)

with cluster(o) returning the cluster to which o belongs, and centroid(C) returning the centroid of cluster C.

PUR(X)= (number_of_majority_class_examples(X)/(total_number_examples_in_clusters(X))

M_PUR(X)= PUR(X)*min(1, sqrt(sqrt(8/|X|)))*(|C₁È…ÈC_k|/|O|)

where |C₁È…ÈC_k| denotes the number of objects in X which are not outliers

Datasets: In the project we will use the Complex8 and the Silhouette dataset we already used in Project1. The Complex8 dataset is a 2D dataset and Silhouette is an 18D dataset; the last attribute of each dataset denotes a class variable which should be ignored when clustering the data sets—however, the class variable will be used in the post analysis of the clusters which are generated by K-means and DBSCAN.

Project2 Tasks:

MR-K-means runs K-means 10 times (with different seeds) obtaining clusterings X₁,…,X₁₀ returning the clustering X_i with the highest value for eval(X_i) as well as the maximum,

average, minimum value, and standard deviation of eval(X₁),…,eval(X₁₀).

Summarize and interpret the results obtainted in the two runs of MR-K-means!

Deliverables for Project2:

[1]E.g. for C={(0,0), (1, 2), (2,1)} then centroid(C)=(1,1).

[2] In the case of K-means this ratio is always 1!

[3] It can be found at: http://www2.cs.uh.edu/~ml_kdd/Complex&Diamond/Complex8.data; it has been visualized at: http://www2.cs.uh.edu/~ml_kdd/Complex&Diamond/2DData.htm

[4] Preferably 6 so that K-means and DBSCAN clustering results can be compared more easily—but this might not be feasible!

[5] The extra credit will be up to 10% of the points associated with Project2. No extra credit will be given to erroneous programs for Task6.

[6] Single-spaced; please use a 11-point or 12-point font!

$Produce a new version of k-means called MR-K-means(\$

Produce a new version of kmeans called MRKmeansMultirun Kmea

Solution

Get Help Now

Submit a Take Down Notice