Keywords: reasoning, domain transfer
Abstract: Reinforcement learning (RL) has proven effective for eliciting reasoning in math and code, yet expanding these capabilities to general domains is hindered by the scarcity of reliable verification signals. We propose Reasoning Curriculum, a two-stage curriculum designed to bootstrap broad reasoning from verifiable domains. Hypothesizing that math serves as a high-signal ``gym'' for cognitive skill discovery, Stage 1 utilizes math-only RL to elicit core reasoning behaviors. Stage 2 subsequently transfers and refines these skills across diverse domains via joint RL. The curriculum is minimal and backbone-agnostic, requiring no specialized reward models beyond standard verifiability checks. Evaluated on Qwen3-4B and Llama-3.1-8B, Reasoning Curriculum yields consistent gains across math, STEM, code, logic, and simulation. Crucially, our analysis confirms that math-first elicitation fosters transferable reasoning skills that spontaneously emerge in non-math domains, demonstrating that both training stages are essential for maximizing performance. Reasoning Curriculum provides a compact, easy-to-adopt recipe for general reasoning.
Paper Type: Long
Research Area: Generalizability and Transfer
Research Area Keywords: reasoning, transfer
Contribution Types: Model analysis & interpretability
Languages Studied: english
Submission Number: 7583
Loading