How Well Does Mathematical Reasoning Transfer Across Languages in English-Centric LLMs?

ACL ARR 2026 March Submission1974 Authors

17 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Reasoning Generalization, Cross-lingual Transfer
Abstract: Recent advances in Reinforcement Post-Training (RPT) have substantially improved the mathematical reasoning capabilities of large language models (LLMs), but it remains unclear how well such gains transfer across languages. In this work, we study the cross-lingual transfer of mathematical reasoning in English-centric LLMs under controlled multilingual evaluation. We systematically evaluate English-centric reasoning models on multilingual mathematical reasoning benchmarks, and the results show that cross-lingual transfer is highly variable across initial model choice, target language, and training paradigm. Through controlled comparative experiments, we further find that stronger English-centric initialization does not necessarily lead to stronger relative transfer efficiency, even when it yields better multilingual reasoning accuracy. Finally, we show that the largest improvement comes from moving beyond English-only supervision: adding a single parallel language yields a substantial gain, while further gains from additional languages are smaller but consistent. Overall, our results suggest that multilingual mathematical reasoning should be evaluated directly rather than inferred from English benchmarks alone.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: Large Language Model, Reasoning Generalization, Cross-lingual Transfer
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: English, Spanish, Russian, German, French, Bengali, Swahili, Thai, Japanese, Chinese, Telugu
Submission Number: 1974
Loading