Abstract: In recent years, large language models (LLMs) have significantly advanced document-level translation quality, leveraging their powerful text-generation and context-understanding capabilities. However, since document-level translation generates outputs holistically rather than sentence-by-sentence, it often suffers from over-translation, under-translation, and the lack of sentence-level alignment information, posing substantial challenges for quality assessment. Existing evaluation methods (e.g., BERTScore, COMET) struggle with long-input constraints, making them impractical for direct application to document-level translation. To address these issues, we propose an automatic evaluation framework based on alignment algorithms. Our approach integrates sentence segmentation tools and dynamic programming to construct sentence-level alignments between source and translated texts, then adapts sentence-level evaluation models to document-level assessment via sliding-window aggregation. Experiments show that our method efficiently and accurately evaluates document-level translation quality, offering a reliable tool for future research.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: Document-Level Metric, Document-Level Machine Translation
Contribution Types: NLP engineering experiment
Languages Studied: English, French, Japanese, and Chinese
Submission Number: 7298
Loading