Understanding Translationese Effects in Multilingual Machine Translation

ACL ARR 2024 June Submission3373 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This study explores the impact of translationese on multilingual machine translation (MT). Using a newly curated directed "one-way" parallel corpora from Global Voices (MSGV), featuring original texts in diverse languages and explicit anotation of actual translation directions, we evaluated the NLLB and TOWER models on MT tasks between English and five other languages. Our results reveal that translationese inputs are easier to translate into English but not out of English. Additionally, machine translations of translationese are lexically richer than those of original texts when translating into English. These findings suggest that multilingual MT systems experience different translationese effects compared to dedicated bilingual systems, underscoring the need for diverse test beds in MT evaluations. We contribute our dataset to enhance future research.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: Machine Translation, Multilingualism and Cross-Lingual NLP, Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English, Spanish, Portuguese, French, Arabic, Bengali
Submission Number: 3373