Abstract: PROCEDURE4. Find words which are semantically connected to the already disambiguated words, and for which the connection distance is 0. The distance is computed based on the WordNet hierarchy; two words are semantically connected at a distance of 0 if they belong to the same synset. PROCEDURE5. Find words which are semantically connected with each other, and for which the connection distance is 0. PROCEDURE6. Find words which are semantically connected to the already disambiguated words, and for which the connection distance is maximum 1; two words are semantically connected at a maximum distance of 1 if they are synonyms or they belong to a hypernymy/hyponymyrelation. PROCEDURE7. Find words which are semantically connected with each other, and for which the connection distance is maximum 1. The text to be disambiguated is first tokenized and part of speech tagged using Brill’s tagger. We also identify the concepts based on WordNet definitions. Two sets of words are maintained, a set of ambiguous words SAW and the set of disambiguated words SDW. The procedures presented above are applied iteratively, until no more words can be disambiguated. Initially, all the words from the text are included in the SAW set and SDW is initialized with the empty set. As words are disambiguated by one of the procedures, they are removed from SAW and added to SDW. This allows us to identify a set of nouns and verbs which can be disambiguated with high precision. We performed several tests using 6 randomly selected files from SemCor. Each of these files has been divided into sets of 15 sentences; these sets are used as input to the algorithm. The results have shown that about 55% of the nouns and verbs are disambiguated with 91% accuracy.
0 Replies
Loading