Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach

Published: 01 Jan 2023, Last Modified: 13 May 2025GWC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rule-based and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.
Loading