Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, reasoning
TL;DR: “RL shortcuts” for LLMs (e.g., one example, noisy/no rewards, negative samples) only work under strong model-task alignment.
Abstract: Recent advances in applying reinforcement learning (RL) to large language models (LLMs) have led to substantial progress. In particular, a series of remarkable yet often counterintuitive phenomena have been reported in LLMs, exhibiting patterns not typically observed in traditional RL settings (e.g., spurious rewards, one-shot RL). However, the precise conditions under which these observations hold remain unclear. In this work, we identify a key factor that differentiates RL observations: whether the pretrained model already exhibits strong *Model-Task Alignment*, as measured by pass@k on the target task. Through systematic experiments across diverse models and tasks, we find that while standard RL remains robust, many counterintuitive results emerge only under strong model-task alignment.
Submission Number: 103
Loading