Track: tiny / short paper (up to 5 pages)
Keywords: sampling, tree search, self-consistency
TL;DR: We can avoiding repeting prefixes in self-consistency by sampling without replacement in a truncated tree space
Abstract: Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly generating the same early prefixes. We show that this can be fully replaced by a deterministic enumeration of completions. While enumerating exponentially many completions would become infeasible for standard samplers, we show that, when the search space is truncated to tokens above a probability threshold, partial enumeration of the most likely or the most diverse branches is significantly more efficient. We propose ε-Leaf Enumeration (ELE), a deterministic tree-search method that imposes a strict non-repetition condition, covering the search space more optimally than via sampling. ELE naturally enables prefix reuse by caching shared prefixes across branches to redundant computation. Empirically, at equal token budgets, ELE explores more distinct completions than self-consistency which translates into consistent improvements in math, coding and general reasoning tasks through efficient parallel reasoning.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 76
Loading