Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne

Published: 15 Jun 2024, Last Modified: 07 May 2026NAACL 2024EveryoneCC BY 4.0

Abstract: Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is com- putationally expensive. We show how the re- cently developed Reinforcement Learning tech- nique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in infer- ence. Our method uses only a small mono- lingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.