Multi-path reasoning on a budget: towards theoretically optimal hyperparameter-free adaptive self-consistency
Keywords: Chain-of-Thought, LLMs, adaptive self-consistency, sample efficiency
TL;DR: We demonstrate optimal sample-efficient adaptive self-consistency for chain-of-thought in large language models.
Abstract: Self-consistency (SC) is one of the most popular test-time inference techniques for augmenting performance in chain-of-thought reasoning. It consists of generating multiple responses, or ``samples", from a large language model (LLM) and selecting the most frequent answer, which can be viewed as an application of majority vote and mode estimation. Despite its effectiveness, self-consistency is prohibitively expensive at scale when naively applied to datasets. By leveraging the mode estimation and voting theory, we design Blend-ASC, a novel variant of self-consistency that dynamically allocates samples, achieving state-of-the-art sample efficiency. We show that our approach uses
$6.8\times$ fewer samples on average compared to adaptive and fixed-allocation self-consistency baselines, demonstrating the superiority of our approach in terms of efficiency. We note that Blend-ASC is not only lightweight but also hyperparameter-free, ensuring it can be easily applied to any self-consistency applications. Finally, we derive novel scaling laws, offering a way to predict sample efficiency for a given target error.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 7176
Loading