Machine Learning In document clustering ambiguity of words c
Machine Learning
In document clustering, ambiguity of words can be decreased by taking the context into account, for example, by considering pairs of words, as in “cocktail party” vs. “party elections.” Discuss how this can be implemented.
Solution
in doucument clustering we aim on to group similar documents for example new can be subdivided as ploitics ,sports,nation etc a document is a bag of words froup together we predefine a lexicon of C words and each document is an C-dimensional binary vector having element i is 1 if word i appears in the document; suffixes “–s” and “–ing” are removed to avoid duplicates in the word bag and words such as “of,” “and,” conjunctions, which are not informative, are not used. Documents are then grouped depending on the number of shared words. same as for cocktail party and party election where party is comman and can create duplicates

