The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

ACL ARR 2025 May Submission3154 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and---surprisingly---supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: Machine Translation,Multilingualism and Cross-Lingual NLP
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: Afrikaans,Bulgarian,Chinese,Czech,Danish,Dutch,English,German,Icelandic,Japanese,Korean,Polish,Russian,Slovak,Swedish,Ukrainian,Vietnamese
Submission Number: 3154
Loading