Keywords: LLM Reasoning, COT Reasoning, Speculative reasoning, LLM inference optimization
Abstract: Large language models achieve strong reasoning performance, but inference strategies such as Self-Consistency (SC) are computationally expensive, as they fully expand all reasoning traces. We introduce $\textbf{PoLR}$ ($\textit{Path of Least Resistance}$), the first inference-time method to leverage $\textit{prefix consistency}$ for compute-efficient reasoning. PoLR clusters short prefixes of reasoning traces, identifies the dominant cluster, and expands all paths in that cluster, preserving the accuracy benefits of SC while substantially reducing token usage and latency. Our theoretical analysis, framed via mutual information and entropy, explains why early reasoning steps encode strong signals predictive of final correctness. Empirically, PoLR consistently matches or exceeds SC across $\textit{GSM8K}$, $\textit{Math500}$, $\textit{AIME24/25}$, and $\textit{GPQA-Diamond}$, reducing token usage by up to 60% and wall-clock latency by up to 50%. Moreover, PoLR is fully complementary to adaptive inference methods (e.g., Adaptive Consistency, Early-Stopping SC) and can serve as a drop-in pre-filter, making SC substantially more efficient and scalable without requiring model fine-tuning.
Submission Number: 45
Loading