Keywords: evaluation, evaluation methodologies, multiword expressions, machine translation
Abstract: Verbal multiword expressions (VMWEs) present significant challenges for natural language processing due to their complex and often non-compositional nature. While machine translation models have seen significant improvement with the advent of language models in recent years, accurately translating these complex linguistic structures remains an open problem. In this study, we analyze the impact of three VMWE categories---verbal idioms, verb-particle constructions, and light verb constructions---on machine translation quality from English to multiple languages. Using established multiword expression datasets and standard machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality, with deeper analysis indicating that this degradation is primarily attributable to the VMWE itself rather than general sentence-level difficulty.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: evaluation, evaluation methodologies, multiword expressions, machine translation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Czech, German, Spanish, Japanese, Russian, Turkish, Chinese
Submission Number: 6424
Loading