Align-then-Slide: A complete evaluation framework  for Ultra-Long Document-Level Machine Translation

Align-then-Slide: A complete evaluation framework for Ultra-Long Document-Level Machine Translation

ACL ARR 2025 May Submission7298 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In recent years, large language models (LLMs) have significantly advanced document-level translation quality, leveraging their powerful text-generation and context-understanding capabilities. However, since document-level translation generates outputs holistically rather than sentence-by-sentence, it often suffers from over-translation, under-translation, and the lack of sentence-level alignment information, posing substantial challenges for quality assessment. Existing evaluation methods (e.g., BERTScore, COMET) struggle with long-input constraints, making them impractical for direct application to document-level translation. To address these issues, we propose an automatic evaluation framework based on alignment algorithms. Our approach integrates sentence segmentation tools and dynamic programming to construct sentence-level alignments between source and translated texts, then adapts sentence-level evaluation models to document-level assessment via sliding-window aggregation. Experiments show that our method efficiently and accurately evaluates document-level translation quality, offering a reliable tool for future research.

Paper Type: Short

Research Area: Machine Translation

Research Area Keywords: Document-Level Metric, Document-Level Machine Translation

Contribution Types: NLP engineering experiment

Languages Studied: English, French, Japanese, and Chinese

Submission Number: 7298

Loading