Think Faster Than Words: Efficient LLM Chain-of-Thought Reasoning via Dynamic Shortcut Decoding

ACL ARR 2026 January Submission723 Authors

24 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient Reasoning,chain-of-thought,Shortcut Decoding
Abstract: This paper proposes Shortcut Decoding, an efficient framework for accelerating Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs). Existing methods that prune or employ early stopping to reduce latency often compromise reasoning reliability. Motivated by the observation that LLMs frequently converge to correct solutions internally before completing explicit textual reasoning, we propose a dual-signal adaptive controller that integrates lightweight probes over internal hidden states with step-level entropy. This controller detects reasoning convergence during generation and adaptively selects between a fast exit path and a stability-verified path to remove redundant steps while preserving answer correctness. Experiments across multiple mathematical reasoning benchmarks demonstrate that Shortcut Decoding reduces token usage by approximately 35\%, maintains accuracy comparable to full CoT decoding, and maintains final-answer accuracy comparable to the full CoT baseline, outperforming existing early-stopping methods without updating the base model. Our code is available at https://anonymous.4open.science/r/test-15A.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency, pruning, inference methods, probing, calibration/uncertainty, chain-of-thought
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 723
Loading