Automatic Word Sense Disambiguation and Construction Identification Based on Corpus Multilevel Annotation

Olga Lyashevskaya, Olga Mitrofanova, Maria Grachkova, Sergey Romanov, Anastasia Shimorina, Alexandra Shurygina

2011 (modified: 02 Nov 2021)TSD 2011Readers: Everyone

Abstract: The research project reported in this paper aims at automatic extraction of linguistic information from contexts in the Russian National Corpus (RNC) and its subsequent use in building a comprehensive lexicographic resource – the Index of Russian lexical constructions. The proposed approach implies automatic context classification intended for word sense disambiguation (WSD) and construction identification (CxI). The automatic context processing procedure takes into account the following types of contextual information represented in the RNC multilevel annotation: lexical (lemma) tags (lex), morphological (grammatical) tags (gr), semantic (taxonomy) tags (sem), and combinations of the various types of tags. Multiple experiments on WSD and CxI are performed using RNC representative context samples for nouns. In each series of experiments we analyze (1) different context markers of meaning of target words and (2) constructions including context markers and target words.

0 Replies