A Data-Driven Approach to Idiomaticity in Russian MWEs Based on Experts' Criteria in Theoretical Linguistics

ACL ARR 2024 April Submission372 Authors

15 Apr 2024 (modified: 04 Jun 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The article observes data analysis of 285 Russian multi-word expressions (MWEs) based on 15 lexical, grammatical and other criteria described in theoretical books and papers on the concept of idiomaticity. The MWEs were collected from the same theoretical sources as the criteria, and a set of experts in linguistics annotated them with these criteria. The distribution of scores in the annotated dataset shows that there are no absolutely idiomatic expressions, and some expressions are clusters of several MWEs. Lexical criteria are among top-scorers and seem to be the most manifested; grammatical criteria are bound to certain conditions; presence of obsolete words and grammar influence ability of an MWE to be replaced with one word. The analysis can be used to build a novel classification of MWEs and as a method for their automatic extraction.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: multi-word expressions
Contribution Types: Data resources, Data analysis
Languages Studied: Russian
Submission Number: 372
Loading