CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency

ICLR 2026 Conference Submission22675 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Self-Consistency, Test-Time Scaling, Efficiency, Reasoning
Abstract: Large language models (LLMs) are often queried multiple times at test time, with predictions aggregated by majority vote. While effective, this self-consistency strategy (Wang et al., 2023) requires a fixed number of calls and fails when the correct answer is infrequent. We introduce Confidence-Guided Early Stopping (CGES), a Bayesian framework that forms posteriors over candidate answers from scalar confidence signals—derived from token probabilities or reward models—and adaptively halts sampling once posterior mass exceeds a threshold. We provide theoretical guarantees in both the ideal case of perfectly calibrated confidences and the realistic regime with noisy confidences. Averaged over five reasoning benchmarks, CGES reduces the average number of calls by 69.4% (e.g., from 16.0 to 4.9) while maintaining accuracy within 0.06 percentage points of self-consistency.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 22675
Loading