Keywords: Document-level Machine Translation, Preference Optimization, Large Language Models, Quality Estimation
Abstract: Large Language Models (LLMs) have enabled a shift from sentence-level to document-to-document (Doc2Doc) machine translation, promising improved global coherence. However, document-to-document generation in a single pass frequently suffers from structural misalignment, manifesting as sentence omissions or hallucinations that violate the core requirement of source-target correspondence. To address this, we introduce **S**entence **T**ranslation **A**lignment **R**ate (**STAR**), an auxiliary metric that explicitly quantifies sentence-level structural fidelity. Building on this, we propose **STAR**-masked **P**reference **O**ptimization (**StarPO**), a framework that ranks document-level hypotheses by structural quality and utilizes a dynamic alignment mask to focus optimization on misaligned segments. Experimental results across news and literary domains demonstrate that StarPO significantly enhances translation quality and structural integrity. Notably, StarPO allows compact models (e.g., Qwen3-4B) to surpass the performance of massive proprietary systems like GPT-4o while maintaining superior token efficiency. We will release our code and datasets.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Chinese, Russian, German, Spanish
Submission Number: 5983
Loading