FALCON: Holistic Framework for Document-Level Machine Translation Evaluation

FALCON: Holistic Framework for Document-Level Machine Translation Evaluation

ACL ARR 2025 May Submission628 Authors

14 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As per Michael Halliday, language is not just a system of rules, but a tool for meaning-making within sociocultural contexts, whereby language choices shape the functions of a text. We employ Julian House's Translation Quality Assessment model inspired by Halliday's Systemic Functional Linguistics to assess Machine Translation (MT) at the document level, establishing a novel approach titled $\textcolor{blue}{FALCON (\textbf{F}unctional \textbf{A}ssessment of \textbf{L}anguage and \textbf{CO}ntextuality in \textbf{N}arratives)}$. It is a skill-specific evaluation framework offering a holistic view of document-level translation phenomena with fine-grained context knowledge annotation. Rather than concentrating on the textual quality, our approach explores the discourse quality of translation by defining a set of core criteria on a sentence basis. To the best of our knowledge, this study represents the inaugural attempt to extend MT evaluation into pragmatics. We revisit WMT 2024 with the English-to-X test set encompassing German, Spanish, and Icelandic, assessing 29 distinct systems in four domains. We present groundbreaking but compelling findings concerning document-level phenomena, which yield conclusions that differ from those established in existing research. Emphasizing the pivotal role of discourse analysis in current MT evaluation, our findings demonstrate a robust correlation with human values, inclusive of the ESA gold scores.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: automatic evaluation of datasets, evaluation methodologies, evaluation

Contribution Types: Model analysis & interpretability

Languages Studied: English,Spanish,German,Icelandic

Keywords: automatic creation and evaluation of language resources, evaluation methodologies, linguistic theories, automatic evaluation

Submission Number: 628

Loading