Critical tokens and inference-time scaling

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: critical token, inference-time scaling
TL;DR: We analyze implicit assumptions made by popular inference-time scaling methods (reward-guided tree search, forced scaling, long reasoning traces) with final answer *distribution* and critical token phenomenon.
Abstract: Many inference-time scaling approaches (e.g., reward-guided tree search, forced scaling, and long reasoning traces) rely on implicit assumptions about the autoregressive generation process. We challenge these assumptions by analyzing the probability distribution over final answers, leveraging the observation that sampling continuations after "critical tokens" yields a deterministic answer distribution. This lens exposes three common misconceptions. (1) Reward-guided tree search assumes that resampling from incorrect intermediate steps improves final accuracy, but we show that many errors are committed at or after critical tokens, where resampling fails to alter the original outcome. (2) Forced scaling (s1; Muennighoff et al., 2025) appends a "Wait" token to induce self-correction, but inserting "Wait" after the critical token is far less effective at altering the final answer than inserting it before. (3) "Distilled reasoning models" that produce long reasoning traces are often said to learn self-correction from teacher models; however, comparing critical token positions reveals that distilled models instead acquire first-pass reasoning, while RL-trained models are the ones that genuinely learn to revise prior solutions. Across all three cases, the assumed and actual mechanisms of inference-time scaling methods exhibit significant gaps, which become visible only when decoding dynamics are directly observed by extensive sampling.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 32
Loading