X-REASONER: Towards Generalizable Reasoning Across Modalities and Domains

X-REASONER: Towards Generalizable Reasoning Across Modalities and Domains

ICLR 2026 Conference Submission14379 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, reinforcement learning, multimodal reasoning, medical reasoning model

TL;DR: X-Reasoner, post-trained solely on general-domain text via SFT and RL, achieves strong cross-domain and cross-modal reasoning, surpassing state-of-the-art models trained on in-domain and multimodal data.

Abstract: Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General domain text-based post-training can enable such strong generalizable reasoning, which is even more effective than in-domain multimodal training. Leveraging this finding, we introduce X-REASONER, a vision-language model with reasoning post training solely from general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-REASONER successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing models trained with in-domain and multimodal data across various general and medical benchmarks (Figure 1). Additionally, we find that X-REASONER’s performance in specialized domains can be further enhanced through continued training on domain-specific text-only data. Building upon this, we introduce X-REASONER-MED, a medical-specialized variant that achieves SOTA (state-of-the-art)-level performance on numerous text-only and multimodal medical benchmarks.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14379

Loading