Diagnose, Then Repair: A Two-Stage MQM-Guided Post-Editing Framework for Domain-Specific Machine Translation

Published: 18 Apr 2026, Last Modified: 26 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automatic Post-Editing, MQM-Style Evaluation, Retrieval-Augmented Generation, Domain-Specific Machine Translation
Abstract: LLM-based machine translation evaluation can closely match human judgments, but in practice it remains largely diagnostic, with the signals rarely translating into direct quality improvements under real production constraints. We propose a two-stage, evaluator-guided automatic post-editing framework that turns MQM-style evaluation into targeted repairs: a retrieval-augmented LLM evaluator outputs structured, span-level MQM diagnoses under an explicit edit contract, and a separate LLM post-editor applies minimal edits restricted to those diagnoses. This separation improves controllability and reduces paraphrastic drift compared to one-stage "judge-and-refine'' baselines. In a systematic study involving seven LLMs spanning three model providers and seven languages, our best configuration consistently improves both reference-based COMET and CometKiwi scores over one-stage post-edit methods, while the evaluator's error spans and severities show strong agreement with human MQM annotations and human editor preferences.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 404
Loading