Keywords: Inference-Time Search, Online Learning, Adaptivity
TL;DR: We introduce a learning-theoretic framework for LLM inference-time search, formalizing the tradeoff between cheap rewards and costly verifiers. We propose a near-optimal adaptive algorithm and establish a separation from active learning.
Abstract: Many inference-time language-model pipelines
combine a cheap reward signal with an expensive
verifier, such as exact answer checking in
mathematical reasoning or hidden-test execution
in code generation. We formalize this setting
using a learning-theoretic lens as generative active
search: a cost-sensitive first-positive search
problem in which a policy adaptively samples
candidates from an unknown distribution, observes
cheap scores, and pays for verifier labels
until it finds a positive example. For a
fixed prompt, the generator and reward model induce
two unknown objects: a distribution over
reward scores and a score-conditioned success
function. When these quantities are known, we
characterize the distribution-aware optimal policy
using a dynamic programming approach. In
the realistic and practical setting where both the
score distribution and success function are unknown,
we propose ADAP, a shellwise adaptive
generate-rank-verify algorithm that progressively
increases the number of sampled responses and
top-ranked verifications. Under the monotonicity
assumption that higher reward scores are no
less likely to pass verification, we show that
ADAP achieves expected cost within a constant
factor of the distribution-aware optimum. We
complement this result with learning-theoretic
lower bounds, based on a centered star number,
showing that structural assumptions on the score–
label relationship are necessary. Experiments
on mathematical reasoning and competitive programming
validate the predicted advantage over
both fixed non-adaptive policies and difficultyadaptive
baselines.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 169
Loading