Referenceless evaluation of machine translation models by ranking performance in Romaninan to English translate-train settings

ACL ARR 2024 June Submission1431 Authors

14 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose a referenceless evaluation method for machine translation (MT) models by assessing their performance in translate-train scenarios across a variety of natural language processing (NLP) tasks. We compare four prominent MT tools by using them to translate tasks from Romanian into English and investigate their impact on text summarization, sentiment analysis, and authorship identification. Our findings demonstrate that while translation significantly boosts performance in summarization and sentiment analysis, it adversely affects the identification of authorship in poetry. In response to the observed performance disparities among MT models, we have developed a ranking system that aligns closely with human preferences. This system avoids reliance on professional ground-truth translations, which are typically required by traditional MT evaluation metrics like BLEU but can be biased by the quality of the reference and the translator’s proficiency. Our approach provides a more authentic measure of MT quality, reflecting more accurately how these models perform in practical applications.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: automatic evaluation, human evaluation, biases
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: Romanian, English
Submission Number: 1431
Loading