"Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning"

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: entropy, early stopping, LLM efficiency, adaptive compute, reasoning, confidence estimation, test time scaling, gradient free
TL;DR: A simple, training-free entropy framework that adaptively allocates tokens at test time, saving 25–50% compute while preserving reasoning accuracy across models and benchmarks.
Abstract: We introduce a simple, yet novel entropy-based framework to drive token efficiency in large language models during reasoning tasks. Our approach uses Shannon entropy from token-level logprobs as a confidence signal to enable early stopping, achieving 25-50% computational savings while maintaining task accuracy. We show that the entropy threshold to stop reasoning varies from model to model but can be calculated easily in one shot using only a few examples from existing reasoning datasets. Our results indicate that models often know that they've gotten a correct answer early on, and that knowledge can be used to save tokens and reduce latency for reasoning tasks.
Submission Number: 118
Loading