Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Zhoujun Cheng; Shibo Hao; Tianyang Liu; Fan Zhou; Yutao Xie; Feng Yao; Yuexin Bian; Nilabjo Dey; Yonghao Zhuang; Yuheng Zha; Yi Gu; Kun Zhou; Yuqi Wang; Yuan Li; Richard Fan; Jianshu She; Chengqian Gao; Abulhair Saparov; Taylor W. Killian; Haonan Li; Mikhail Yurochkin; Eric P. Xing; Zhengzhong Liu; Zhiting Hu

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, reasoning, reinforcement learning

Abstract: Reinforcement learning (RL) has shown promise in enhancing large language model (LLM) reasoning, yet progress towards broader capabilities is limited by the availability of high-quality, multi-domain datasets. This work introduces \ours, a 92K RL-for-reasoning dataset designed to address this gap, covering six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular, each with corresponding verifiers. We build \ours via a careful data-curation pipeline, including sourcing, deduplication, reward design, and domain-specific and difficulty-based filtering, to facilitate the systematic investigation of cross-domain RL generalization. Our study using \ours suggests the efficacy of a simple mixed-domain RL training approach and reveals several key aspects affecting cross-domain transferability. We further train two models {\ours}-7B and {\ours}-32B purely with RL on our curated data and observe largely improved performance over leading open RL reasoning model baselines, with gains of 7.3\% and 7.8\% respectively on an extensive 17-task, six-domain evaluation suite. We are releasing our dataset, code, and evaluation suite to the community, aiming to support further research and development of more general RL-enhanced reasoning models.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/LLM360/guru-RL-92k

Supplementary Material: zip

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 2447

Loading

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Nilabjo Dey, Yonghao Zhuang, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Taylor W. Killian, Haonan Li et al. (4 additional authors not shown)