Keywords: Compositional Generalization, Reinforcement Learning, Post-training, Mechanism, Knowledge-intensive Reasoning
TL;DR: We find that Reinforcement Learning (RL) synthesizes new complex reasoning skills under a strict condition that the base model captures sufficient atomic skills via Supervised Fine-tuning (SFT).
Abstract: Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this open question through the lens of Complementary Reasoning: the critical practical capability of integrating internal knowledge with external context, a prerequisite for reliable Retrieval-Augmented Generation.
Using a controlled synthetic dataset of open-domain biographies to avoid contamination, we decompose this capability into two atomic skills: Parametric Reasoning (retrieving facts encoded in model weights) and Contextual Reasoning (processing novel information in the context window).
We present two key findings.
First, models supervised directly on the composite task achieve high accuracy on seen facts and reasoning paths (90%) but collapse on novel facts and reasoning paths (18%), indicating that Supervised Fine-Tuning (SFT) relies on rote memorization rather than true skill integration. Second, RL acts as a reasoning skill synthesizer rather than a mere amplifier, successfully bridging this generalization gap. However, we uncover a prerequisite: RL can only synthesize new composite strategy if the base model has first mastered the independent atomic skills via SFT.
These results suggest that decoupled atomic training followed by RL offers a scalable path to synthesizing complex novel skills.
Submission Number: 11
Loading