Submission Track: Track 1: Machine Learning Research by Muslim Authors
Keywords: LLMS, chain of thought
Abstract: We propose HALT-CoT, an inference-time criterion that ends a chain-of-thought (CoT) once the model’s answer distribution is sufficiently sharp. After every reasoning step, we compute the Shannon entropy of the predicted answers; when this entropy drops below a threshold, generation stops and the current answer is returned. HALT-CoT is training-free, model-agnostic, and requires only streamed token probabilities.
On GSM8K, StrategyQA, and CommonsenseQA, five state-of-the-art LLMs maintain accuracy within ±0.4 percentage points of full CoT while emitting 15–30% fewer tokens; for example, GPT-4 keeps 92% accuracy on GSM8K yet saves 25% of decoding. Entropy-over-time traces show that, in the majority of cases, uncertainty falls monotonically, validating entropy as a halting signal.
Unlike prior early-exit techniques that need extra heads, fine-tuning, or static truncation, HALT-CoT plugs directly into existing CoT pipelines and adapts per instance, delivering a simple path to faster and cheaper LLM reasoning without loss of quality.
Submission Number: 29
Loading