Abstract: As per Michael Halliday, language is not just a system of rules, but a tool for meaning-making within sociocultural contexts, whereby language choices shape the functions of a text. We employ Julian House's Translation Quality Assessment model inspired by Halliday's Systemic Functional Linguistics to assess Machine Translation (MT) at the document level, establishing a novel approach titled $\textcolor{blue}{FALCON (\textbf{F}unctional \textbf{A}ssessment of \textbf{L}anguage and \textbf{CO}ntextuality in \textbf{N}arratives)}$. It is a skill-specific evaluation framework offering a holistic view of document-level translation phenomena with fine-grained context knowledge annotation. Rather than concentrating on the textual quality, our approach explores the discourse quality of translation by defining a set of core criteria on a sentence basis. To the best of our knowledge, this study represents the inaugural attempt to extend MT evaluation into pragmatics. We revisit WMT 2024 with the English-to-X test set encompassing German, Spanish, and Icelandic, assessing 29 distinct systems in four domains. We present groundbreaking but compelling findings concerning document-level phenomena, which yield conclusions that differ from those established in existing research. Emphasizing the pivotal role of discourse analysis in current MT evaluation, our findings demonstrate a robust correlation with human values, inclusive of the ESA gold scores.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: automatic evaluation of datasets, evaluation methodologies, evaluation
Contribution Types: Model analysis & interpretability
Languages Studied: English,Spanish,German,Icelandic
Keywords: automatic creation and evaluation of language resources, evaluation methodologies, linguistic theories, automatic evaluation
Submission Number: 628
Loading