Reset-and-Discard (ReD) Improves Coverage at every Budget under Inference Power-Law Scaling

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, inference scaling, test-time compute, pass@k, coverage, budgeted inference, Reset and Discard, stochastic resetting, verifier-based evaluation, compute allocation, renewal theo
Abstract: The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@$k$, the probability of answering a question correctly at least once in $k$ trials. At a fixed budget across a workload of many tasks, a more suitable metric is coverage@cost: the expected number of unique questions answered as a function of total attempts. We connect these metrics via renewal theory and show that the empirically-observed power-law scaling of pass@$k$ (with exponent $0<\alpha<1$) leads to sublinear (diminishing-returns) growth of coverage@cost under standard solve-to-completion allocation. We propose Reset-and-Discard (ReD), a cross-problem allocation policy that provably restores linear coverage growth and maximizes coverage@cost at every budget, even under imperfect verifiers. ReD also provides a statistically efficient method to estimate inference power-law exponents when large $k$ pass@$k$ measurements are expensive. Experiments across three LLMs and three benchmarks show large reductions in required attempts, tokens, and USD cost.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 107
Loading