Keywords: Large Reasoning Model, LRM, Chain-of-Thought, Deepseek
Abstract: Smaller Large Reasoning Models (LRMs) have shown remarkable capabilities for their model size. However, due to Chain-of-Thought (CoT) reasoning, these models often produce redundant and verbose reasoning chains when short reasoning suffices, leading to excessive computation and tokens generated. We propose a training-free early exit approach that detects newline-scoped, low confidence connector words and self-truncates at the boundary of the previous step when that step shows sufficient semantic similarity to the original prompt. Our three-pronged, training-free approach can be easily incorporated with open-source LRMs such as DeepSeek-Distill-Qwen-7B, DeepSeek-Distill-Llama-8B and QwQ-32B. Experiments across GSM8K, MATH500, and AMC have resulted in a minimal reduction in average accuracy and a significant decrease in average token count. More broadly, our method highlights the potential of using low-confidence tokens to identify potential self-truncation points for early exiting.
Submission Number: 141
Loading