Statistical Early Stopping for Reasoning Models

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient Reasoning, Conformal Prediction
Abstract: While LLMs have seen substantial improvement in reasoning capabilities, they also sometimes overthink, generating unnecessary reasoning steps, particularly under uncertainty given ill-posed or ambiguous queries. We introduce statistically principled early stopping methods that monitor uncertainty signals during generation to mitigate this issue. Our first approach is nonparametric and provides finite-sample guarantees on the probability of halting too early on well-posed queries. Our second approach is parametric: it models inter-arrival times of uncertainty keywords as a renewal process and applies sequential testing for stopping. We conduct empirical evaluations on reasoning tasks across several domains and models. Our results indicate that uncertainty-aware early stopping can improve both efficiency and reliability in LLM reasoning. The performance varies across domains, and we observe especially significant gains for math reasoning.
Submission Number: 137
Loading