It is important to define or select similarity measures in d

It is important to define or select similarity measures in data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification. However, the studies show that there is no single similarity measure approach that consistently outperforms other approaches in all situations. Nonetheless, seemingly different similarity measures may be equivalent after some transformations. Let us considered 5 data objects in Table 1:

skin

insu

mass

pedi

x₁

33.6

0.627

x₂

188

26.6

0.351

x₃

128

23.3

0.672

x₄

28.1

0.167

x₅

168

43.1

2.288

Table 1: Diabetes

Attribute information are listed below:

Triceps skin fold thickness in mm (skin): minimum value is 0 and maximum value is 99.

2-Hour serum insulin in mu U/ml (insu): minimum value is 0 and maximum value is 850.

Body mass index measured as weight in kg/(height in m)^2 (mass): minimum value is 0 and maximum value is 70.0.

Diabetes pedigree function (pedi): minimum value is 0.05 and maximum value is 2.50.

Given a new object (20, 98, 25.6, 0.201) as a query, rank the objects in Table 1 based on similarity with the query using Supremum distance. Then, identify which of the following is a true statement about the ranking.

	skin	insu	mass	pedi
x₁	19	88	33.6	0.627
x₂	20	188	26.6	0.351
x₃	28	128	23.3	0.672
x₄	21	94	28.1	0.167
x₅	34	168	43.1	2.288

20, 98, 25.6, 0.201

=>x2=skin=20

x4=insu=94

x2=mass=25.6

x4=pedi=0.201

The true statement is this :Diabetes pedigree function (pedi): minimum value is 0.05 and maximum value is 2.50

It is important to define or select similarity measures in data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification.

It is important to define or select similarity measures in d

Solution

Get Help Now

Submit a Take Down Notice