Keywords: machine translation, preference alignment, large language model
Abstract: Although Large Language Models (LLMs) like GPT-4 perform excellently in machine translation, their high costs and scalability make them unavailable in many scenarios. Recently, there has been increased effort to build smaller LLMs that can achieve comparable performance. However, while typical instruction tuning methods tend to directly mimic reference translations, leading to less meaningful results, recent preference optimization methods have shown improvements. Despite this, they still fail to effectively utilize crucial preference information during inference. In this paper, we introduce Preference-Enhanced Instruction Tuning (PEIT), a novel method that explicitly incorporates preferences into both the instruction fine-tuning and the inference phase. Our extensive experiments show that PEIT not only improves translation quality but also significantly outperforms state-of-the-art preference optimization methods and instruction tuning baselines on multiple language benchmarks.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4164
Loading