Abstract: Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (\(MC^k\)). \(MC^k\)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The \(MC^k\) model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the \(MC^k\)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of \(MC^k\)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.
External IDs:dblp:conf/ekaw/GrassoRC22
Loading