Beyond Parallel Corpora: Unlocking Autonomous Machine Translation via Autocritical Reinforcement Learning

Beyond Parallel Corpora: Unlocking Autonomous Machine Translation via Autocritical Reinforcement Learning

ICLR 2026 Conference Submission18284 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Machine Translation, Reinforcement Learning

Abstract: Despite the remarkable success in Machine Translation (MT), the substantial computation cost of fine-tuning-based solutions constrains the scalability of Large Language Models (LLMs) and impedes further improvement in this area. To this end, we propose \textbf{Self-Trans}, a new paradigm in which we use only ubiqutious monolingual data while acquiring 80.42\% performance improvement in MT. Self-Trans is a reference-free reinforcement learning framework that learns through self-assessment. It generates its own supervision by evaluating the consistency of round-trip translations, guided by a carefully architected reward function that balances semantic adequacy with reconstruction fidelity and prevents reward hacking. Relying solely on low-resource pairs, our method consistently and comprehensively outperforms much larger models ($70$B+). Moreover, the Self-Trans-8B model achieves comparable results on most mainstream benchmarks against state-of-the-art baselines. In conclusion, Self-Trans frees itself from the constraints of parallel data in existing approaches. It offers an efficiently scalable paradigm for the future development of autonomous machine translation.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18284

Loading