Keywords: LLMs, Reasoning, Multilingual
Abstract: Multilingual reasoning has recently emerged as a powerful strategy for extending the reach and impact of large language models (LLMs). By enabling models to operate effectively across diverse languages and modalities, it broadens access to advanced reasoning capabilities for a wider range of users and linguistic communities. Yet reliably activating such behaviours through training remains difficult. Existing approaches rely heavily on supervised fine-tuning over synthetic data, which tends to encourage imitation of teacher signals rather than genuine exploration or robust generalisation.
To address this gap, we introduce, we propose \textbf{Polyglot-R1}, the first reinforcement learning framework designed to cultivate multilingual, multi-perspective reasoning behaviours for complex, real-world tasks. Our framework introduces a progressive curriculum that directly tackles the cold-start problem in training with reinforcement learning. We begin with supervised fine-tuning on trajectories from more straightforward multilingual prompts to instil the foundations of this reasoning style. We then transition to reinforcement learning, enabling the model to actively explore and generalise this skill on more challenging multilingual and multimodal problems. Experiments demonstrate that Polyglot-R1 not only improves accuracy but also reshapes the way models reason. At earlier stages of training, multilingual reasoning functions as an exploration strategy, encouraging the model to test diverse lines of thought. At later stages, the same capacity is repurposed as a mechanism for multi-perspective verification, strengthening confidence in the final answer. Most importantly, we validate multilingual reasoning as an intermediate exploration scaffold: a temporary but crucial phase that unlocks more robust, transferable reasoning capabilities across languages.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24718
Loading