It is important to define or select similarity measures in d
It is important to define or select similarity measures in data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification. However, the studies show that there is no single similarity measure approach that consistently outperforms other approaches in all situations. Nonetheless, seemingly different similarity measures may be equivalent after some transformations. Let us considered 5 data objects in Table 1:
skin
insu
mass
pedi
x1
19
88
33.6
0.627
x2
20
188
26.6
0.351
x3
28
128
23.3
0.672
x4
21
94
28.1
0.167
x5
34
168
43.1
2.288
Table 1: Diabetes
Attribute information are listed below:
Triceps skin fold thickness in mm (skin): minimum value is 0 and maximum value is 99.
2-Hour serum insulin in mu U/ml (insu): minimum value is 0 and maximum value is 850.
Body mass index measured as weight in kg/(height in m)^2 (mass): minimum value is 0 and maximum value is 70.0.
Diabetes pedigree function (pedi): minimum value is 0.05 and maximum value is 2.50.
Given a new object (20, 98, 25.6, 0.201) as a query, rank the objects in Table 1 based on similarity with the query using Supremum distance. Then, identify which of the following is a true statement about the ranking.
| skin | insu | mass | pedi | |
| x1 | 19 | 88 | 33.6 | 0.627 |
| x2 | 20 | 188 | 26.6 | 0.351 |
| x3 | 28 | 128 | 23.3 | 0.672 |
| x4 | 21 | 94 | 28.1 | 0.167 |
| x5 | 34 | 168 | 43.1 | 2.288 |
Solution
20, 98, 25.6, 0.201
=>x2=skin=20
x4=insu=94
x2=mass=25.6
x4=pedi=0.201
The true statement is this :Diabetes pedigree function (pedi): minimum value is 0.05 and maximum value is 2.50

