Keywords: Machine transaltion, quality audit, low resource languages, African languages, translation quality
Abstract: Large-scale translation projects for low-resource languages mostly rely on human translators to ensure cultural and linguistic fidelity. However, even professionally produced translations often contain subtle translation errors that are difficult to detect. Manual quality control at scale becomes prohibitively expensive, creating a major bottleneck in the development of high-quality Natural Language Processing (NLP) resources. Recent advances in multilingual large language models (LLMs) offer promising support for annotation workflows with human-in-the-loop settings. In this work, we investigate the use of LLMs to assist in auditing translation quality, enabling more efficient quality control pipelines for low-resource African languages. We audit translations in 11 African languages using the MAFAND-MT dataset, combining LLM-as-a-judge, native-speaker human review, and automated metrics. Our quality-audited version of MAFAND-MT test set yields performance gains across all languages, with BLEU scores ranging from 0.4 to 9.27 and chrF scores ranging from 0.3 to 8.69. Our findings further indicate that state-of-the-art LLMs, such as GPT-5.1, can assist in auditing translation quality and suggesting candidate corrections for low-resource languages. However, they remain far from being a stand-alone solution for the automatic correction of human translations in African languages.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine transaltion, quality audit, low resource languages, African languages, translation quality
Contribution Types: Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Amharic, Hausa, Igbo, Kinyarwanda, Luo (Dholuo), Nigerian Pidgin, Shona, Swahili (Kiswahili), Tswana (Setswana), Twi (Akan-Twi), Yoruba
Submission Number: 7244
Loading