Patching Gaps In LLM Reasoning With Interventional Training

ICLR 2026 Conference Submission14634 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, reasoning, intervention, SFT, RL
Abstract: Reinforcement learning (RL) training of large language models (LLMs) is limited by the policy's ability to generate rollouts with non-zero rewards: without such rewards, the policy is not updated and learning is stalled on hard problems, which are problems that the policy consistently fails to sample any correct rollouts for. We find that many hard problems remain unsolved due to the repeated generation of incorrect intermediate steps in a long reasoning trace; identifying and fixing these requires performing better \emph{credit assignment}. But existing approaches for credit assignment are either impractical or impose a substantial data-writing burden on oracles (\textit{e.g.}, humans). In this paper, we introduce \textbf{Interventional Training} (InT), a framework that leverages single-step oracle interventions to improve LLM reasoning. Given a reasoning attempt and ground-truth answer, the oracle detects and then provides language feedback on a single intermediate reasoning step, which is much cheaper than obtaining a full reasoning trace. \methodname{} then \emph{patches} the LLM by running supervised fine-tuning on the on-policy rollout up to the error, followed by the correction from the oracle. RL on this patched model now generates counterfactual traces and with merely $\approx$$100$ interventions from the oracle, \methodname{} solves 16\% more hard test problems that were previously unsolved (only zero rewards) and also improves performance across multiple standard evals.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14634
Loading