Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

ICLR 2026 Conference Submission23008 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Reasoning, RLVR
Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly improved reasoning in large language models (LLMs), yet the token-level mechanisms through which they reshape model behavior remain unclear. We present a systematic empirical study of RLVR’s distributional effects across three complementary axes: (1) token-level distributional shifts, (2) functional validation via cross-sampling interventions, and (3) exploratory investigations of advantage signal modulation based on token divergence. We find that RL fine-tuning induces sparse, targeted changes, with only a small fraction of tokens exhibiting significant distributional divergence, and we further analyze the nature of these shifts. These divergent distributions are not uniformly predicted by entropy, indicating that RLVR can modify both initially high and low entropy distributions under different settings. Cross-sampling experiments reveal that inserting just a small fraction of RL-sampled tokens into base model generations recovers most RL performance gains, while injecting a small portion of base-sampled tokens into RL generations collapses performance to base levels, functionally isolating the critical role of divergent tokens. Finally, we explore divergence-weighted variants of the advantage signal, finding that they can amplify improvements in baselines. Our work sheds light on the distributional changes induced by RLVR and provides a granular, token-level lens for understanding and improving RL fine-tuning in LLMs.
Primary Area: reinforcement learning
Submission Number: 23008
Loading