2 Consider three string precision prediction and precise 30

2. Consider three string \"precision\", \"prediction\" and \"precise\". [30 points] 1) List bi-grams and unique bi-grams for each string. 2) Use Dice Coefficient to computer similarity of any pair of strings 3) will clstex siings wnveir similan, usi any ol strings? Why?

Solution

Hi,
First, Bigram means the sequence of 2 adjacent elements from a string, therefore the given strings they are
precision- {pr,re,ec,ci,is,si,io,on}
prediction-  {pr,re,ed,di,ic,ct,ti,io,on}
precise- {pr,re,ec,ci,is,se}
2. Dice coefficient is given by
s= 2. nt /(nx+ny)
where nt - is number of bigrams found in both strings
nx- is number of bigrams found in first string
ny- is number of bigrams found in second string,
lets take the example of
precision and precise
nx=  {pr,re,ec,ci,is,si,io,on} = 8
ny= {pr,re,ec,ci,is,se} = 6
nt = interesction of above two sets= {pr,re,ec,ci,is}=5,
therefore similarity is s=2*5/(8+6)= 0.71


for precision and prediction
nx=  {pr,re,ec,ci,is,si,io,on} = 8
ny= {pr,re,ed,di,ic,ct,ti,io,on} = 9
nt = interesction of above two sets= {pr,re,io,on}=4
therefore similarity is s=2*4/(8+9)= 0.47

for precise and prediction
nx=   {pr,re,ec,ci,is,se} = 6
ny= {pr,re,ed,di,ic,ct,ti,io,on}  = 9
nt = interesction of above two sets= {pr,re}=2
therefore similarity is s=2*2/(6+9)= 0.26
3. Now, if we are taking cluster of similar strings if their similarity>0.5, then precision and precise will fall under same cluster, because only those pair has s>0.5

Thumbs up if this was helpful, other wise let me know in comments


Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site