Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

ACL ARR 2026 January Submission6263 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-consistency, reasoning efficiency, adaptive sampling

Abstract: Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost. Adaptive self-consistency methods mitigate this issue by adjusting the sampling budget; however, they rely on count-based stopping rules that treat all responses equally, often leading to unnecessary sampling. We propose \textbf{Re}liability-Aware \textbf{A}daptive \textbf{S}elf-\textbf{C}onsistency (\texttt{ReASC}), which addresses this limitation by reframing adaptive sampling from response counting to evidence sufficiency, leveraging response-level confidence for principled information aggregation. \texttt{ReASC} operates in two stages: a single-sample decision stage that resolves instances confidently answerable from a single response, and a reliability-aware accumulation stage that aggregates responses by jointly leveraging their frequency and confidence. Across five models and four datasets, \texttt{ReASC} consistently achieves the best accuracy-cost trade-off compared to existing baselines, yielding improved inference efficiency across model scales from 3B to 27B parameters. As a concrete example, \texttt{ReASC} reduces inference cost by up to 70\% relative to self-consistency while preserving accuracy on GSM8K using Gemma-3-4B-it.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: inference methods, self-consistency

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 6263

Loading