Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning

ACL ARR 2026 January Submission2622 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adaptive test-time reasoning, Reflective redundancy, Bandit-based early stopping
Abstract: Chain-of-Thought (CoT) reasoning has driven recent gains of large language models (LLMs) on reasoning-intensive tasks by externalizing intermediate steps. However, excessive or redundant reasoning --- so-called overthinking --- can increase inference costs and lead LLMs toward incorrect conclusions. In this paper, we present \textbf{REFRAIN} (\underline{REF}lective-\underline{R}edundancy for \underline{A}daptive \underline{IN}ference), a training-free framework that adaptively determines when to stop reasoning to mitigate overthinking. REFRAIN integrates a two-stage stop discriminator to identify reflective yet redundant reasoning and a sliding-window Upper Confidence Bound (SW-UCB) multi-armed bandit controller to dynamically adjust stopping thresholds according to problem difficulty without supervision or fine-tuning. Across four representative benchmarks and two model families, REFRAIN reduces token usage by 20-55\% while maintaining or improving accuracy compared to standard CoT prompting. Extensive ablation and robustness analyses demonstrate its stability across models, scorers, and prompt variations. In summary, our findings highlight when-to-stop as a new and practical axis of test-time scaling --- enabling models to reason not just more, but just enough.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: efficient models, inference methods
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 2622
Loading