Mitigating Overthinking in Language Models Using Dynamic Stopping Criteria

ACL ARR 2026 January Submission994 Authors

26 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM efficiency, dynamic stopping, chain-of-thought reasoning, overthinking in language models, early exit in generation, inference control, token efficiency
Abstract: Large language models (LLMs) often overthink, generating unnecessarily long chains-of-thought that waste computation and can even reduce accuracy. We propose a dynamic stopping framework that detects overthinking in real time and halts generation once further reasoning is predicted to be unproductive. Our method computes a novel overthinking score from the model's own output and internal signals, combined with domain-specific stopping triggers. Experiments on the MATH500 benchmark show that our approach can reduce 30\% of the generation tokens on average while maintaining competitive accuracy (within 5\% of the no-stopping baseline). We compare against recent overthinking mitigation methods and demonstrate that our method achieves a favorable balance of efficiency and reliability without requiring additional training or external calibration. Finally, we discuss interpretability insights and future directions for understanding and intervening in overthinking behavior in LLMs.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: quantization, pruning, LLM Efficiency
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency
Languages Studied: Python, English
Submission Number: 994
Loading