Learning to Translate by Translating: Stabilizing the Dual Loop via Semantic-Aware Self-Evolution

ACL ARR 2026 January Submission8517 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Machine Translation, Reinforcement Learning
Abstract: Despite the remarkable success of Large Language Models (LLMs) in Machine Translation (MT), the scarcity of high-quality parallel corpora and the prohibitive cost of their acquisition constrain scalability. To this end, we propose \textbf{L}earning to \textbf{T}ranslate by \textbf{T}ranslating (\textbf{LTT}), an LLM-driven dual-learning framework that enables autonomous translation, achieving an 80.42\% performance improvement over the base model. By adapting the cycle-consistency principle to the generative paradigm, LTT eliminates the need for parallel data. It employs a robust semantic-aware reward function that balances adequacy with reconstruction fidelity, effectively mitigating the reward hacking issues inherent in traditional unsupervised MT. Relying solely on monolingual data, our 8B model consistently outperforms significantly larger models ($70$B+) in low-resource settings and achieves parity with state-of-the-art supervised baselines on mainstream benchmarks. LTT thus offers a scalable, data-efficient paradigm for autonomous machine translation.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: zero-shot MT,MT theory,multilingual MT
Contribution Types: Theory
Languages Studied: English,Chinese
Submission Number: 8517
Loading