Logit–Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning

Mohammad Atif Quamar; Mohammad Areeb

Logit–Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning

Mohammad Atif Quamar, Mohammad Areeb

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Reasoning, Efficient Inference Methods, Chain of Thought (CoT), Adaptive Halting

TL;DR: LEASH is a training-free, decoding-time heuristic that watches token-level entropy slope and top-logit margin for a plateau, then halts chain-of-thought and asks for the final answer. No extra models or retraining needed.

Abstract: Chain-of-Thought (CoT) prompting is a key technique for enabling complex reasoning in large language models. However, generating full, fixed-length rationales is computationally wasteful, inflating both token usage and latency. We introduce **LEASH**: **L**ogit-**E**ntropy **A**daptive **S**topping **H**euristic, a training-free decoding algorithm that adaptively halts rationale generation. **LEASH** monitors two intrinsic signals: the slope of token-level entropy and the improvement in the top-logit margin. It terminates the generation once both signals plateau, indicating the model has reached a stable reasoning state. Across four instruction-tuned models on the GSM8K and AQuA-RAT benchmarks, **LEASH** reduces average token generation by $\approx$ 30--35\% and latency by $\approx$ 27\%, while incurring a $\approx$ 10 p.p. accuracy drop relative to CoT. **LEASH** is model-agnostic and requires no additional training or supervision, offering a simple and efficient alternative to CoT decoding.

Submission Number: 298

Loading