Abstract: Recently, a plethora of works have proposed
inference-time algorithms (e.g. best-of-n), which
incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been
empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood.
In this paper, we develop a mathematical framework for reasoning about constrained generation
using a pre-trained language model generator oracle and a process verifier—which can decide
whether a prefix can be extended to a string which
satisfies the constraints of choice. We show that
even in very simple settings, access to a verifier
can render an intractable problem (information-theoretically or computationally) to a tractable
one. In fact, we show even simple algorithms,
like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of
tokenwise rejection sampling, in which the sampler is allowed to "backtrack" (i.e., erase the final
few generated tokens) has robust and substantive
benefits over natural baselines (e.g. (blockwise)
rejection sampling, nucleus sampling)—both in
terms of computational efficiency, accuracy and
diversity.
Lay Summary: In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier--which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of tokenwise rejection sampling, in which the sampler is allowed to "backtrack" (i.e., erase the final few generated tokens) has robust and substantive benefits over natural baselines (e.g. (blockwise) rejection sampling, nucleus sampling)--both in terms of computational efficiency, accuracy and diversity.
Link To Code: https://github.com/YuchenLi01/LM_Query_Complexity
Primary Area: Deep Learning->Theory
Keywords: verifier-assisted language generation, query complexity, inference-time scaling, best-of-N sampling, constrained generation, theory
Submission Number: 14496
Loading