Keywords: LLMs, Reasoning, Test Time Inference, Backtracking, In-Context Verifiers
TL;DR: Combining multi-step and multi-turn reasoning with preemptive backtracking and value verification optimizes inference efficiency by revising mistakes early in the reasoning process.
Abstract: Solving reasoning problems is an iterative multi-step computation, where a reasoning agent progresses through a sequence of steps, with each step logically building upon the previous one to reach a desired conclusion. If the desired solution is not attained, the agent must backtrack and try reasoning chains that are quite different from previous attempts. Though prior work such as test-time search against an outcome verifier can improve performance, most search is done in parallel via Best-of-N reranking, and independently for each attempt at a problem, thus wasting a significant amount of computation in sampling multiple full solutions even beyond the point that is needed. Can we reduce the total amount of computation by sharing information and computation across multiple attempts to a given problem? In this paper, we build a novel approach combining process verifiers that predict likelihoods of success \emph{per step} with preemptive backtracking to maximize performance per generated token. To do this, the PRM can be used to identify where a problematic step in a solution trace is by using the sensitivity of the predictions of the learned verifier and allowing the model to do focused resampling of the problematic portion of a solution. This approach can significantly reduce the amount of computation by leveraging partial computation from previous revisions. To further enhance the computational efficiency of inference, we introduce in-context process supervision, where the verifier is conditioned on the history of revisions that are attempted, reducing uncertainty in the verification decisions and improving the verifier's confidence with each round of backtracking. This framework for iterative backtracking, leveraging in-context process supervision, enables an effective tradeoff between inference and model performance.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13445
Loading