An Evaluation Resource for Grounding Translation Errors

ACL ARR 2025 February Submission2096 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The fine-grained manual annotations of translation errors gains more and more attention in machine translation, but these annotations do not ground the errors to the reasons why the annotated text spans are erroneous, resulting in the hardness of evaluating LLMs trustworthiness in the fine-grained error analysis. In this paper, we manually build an evaluation resource for grounding the translation errors through a bi-directional grounding scheme. In the forward direction, we annotate the explanation of reason for each error span. In the backward direction, we annotate the error span given its explanation, in which the error span is masked. If the error spans of both directions are consistent, we deem the explanation is valid. Such grounding process can regulate the explanation so as to avoid the subjective bias. We evaluate LLMs ability in grounding the translation errors on the resource. The results show that LLMs perform significantly worse than human in both directions. Furthermore, we apply the error grounding for filtering false alarmed errors, and achieve significant improvement in translation error detection.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: benchmarking, evaluation methodologies, fine-grained error analysis, machine translation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English; Chinese; German
Submission Number: 2096
Loading