One Model, Many Worlds: Cross-Lingual Fine-Tuning Can Improve Low-Resource Capabilities of Language Model
Keywords: large language models, fine-tuning, generalization, multilingual
TL;DR: This paper examines fine-tuning efficacy to determine the relative importance of language, domain, and resource-level, exploring how we can reduce these disparities in performance.
Abstract: Multilingual language models (LLMs) have demonstrated strong cross-lingual reasoning and comprehension capabilities. However, substantial performance disparities persist between high- and low-resource languages due to unbalanced availability of training data and linguistic diversity. This paper examines fine-tuning efficacy to determine the relative importance of language, domain, and resource-level, exploring how we can reduce these disparities in performance. Using gpt-4.1-nano-2025-04-14, we conducted experiments on three domains: STEM, Medical, and Humanities from the Global-MMLU dataset, focusing primarily on cross-lingual transfer. We find substantial accuracy improvements when transferring from high to low resource settings (≈+16%), but large performance degradation when transferring in the opposite direction (≈-13%). Additionally, we find that only cross-lingual (+2.61%) transfers demonstrate a net improvement while cross-domain (-2.44%) transfers degrade performance. These findings present preliminary evidence that training data from linguistically diverse languages can enhance model generalization and narrow the performance gap in multilingual language models, even when low-resource language data is scarce or absent altogether.
Submission Number: 28
Loading