Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?

Chrysoula Zerva, Frédéric Blain, José Guilherme Camargo de Souza, Diptesh Kanojia, Sourabh Dattatray Deoghare, Nuno Miguel Guerreiro, Giuseppe Attanasio, Ricardo Rei, Constantin Orasan, Matteo Negri, Marco Turchi, Rajen Chatterjee, Pushpak Bhattacharyya, Markus Freitag, André Martins

Published: 01 Nov 2024, Last Modified: 09 Jan 2026 Proceedings of the Ninth Conference on Machine TranslationEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We report the results of the WMT 2024 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. In this edition, we expanded our scope to assess the potential for quality estimates to help in the correction of translated outputs, hence including an automated post-editing (APE) direction. We publish new test sets with human annotations that target two directions: providing new Multidimensional Quality Metrics (MQM) annotations for three multi-domain language pairs (English to German, Spanish and Hindi) and extending the annotations on Indic languages providing direct assessments and post edits for translation from English into Hindi, Gujarati, Tamil and Telugu. We also perform a detailed analysis of the behaviour of different models with respect to different phenomena including gender bias, idiomatic language, and numerical and entity perturbations. We received submissions based both on traditional, encoder-based approaches as well as large language model (LLM) based ones.
Loading