An Evaluation Resource for Grounding Translation Errors

An Evaluation Resource for Grounding Translation Errors

ACL ARR 2025 May Submission7529 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Machine translation systems inevitably make translation errors. Studying the errors paves the important way towards building the error-free translation systems. Current fine-grained error analyses by LLMs gain more and more attention in machine translation, but these analyses do not ground the errors to the reasons why the annotated text spans are erroneous. In this paper, we evaluate whether LLMs really know such reasons when grounding the translation errors by manually building an evaluation resource through a bi-directional grounding scheme. In the forward direction, we annotate the explanation of the reason for each error span. In the backward direction, we annotate the error span given its explanation, in which the error span is masked. If the error spans of both directions are consistent, we deem the explanation is valid. Such grounding process can regulate the explanation so as to avoid the subjective bias. We evaluate LLMs grounding ability on this resource, and the results show that LLMs perform significantly worse than human in both directions. Furthermore, we apply the error grounding for filtering false alarmed errors, and achieve significant improvement in translation error detection.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: benchmarking, evaluation methodologies, fine-grained error analysis, machine translation

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English; Chinese; German

Submission Number: 7529

Loading