Keywords: Programming-by-example, program synthesis, large language models
TL;DR: We propose to sample, combine and execute different lines of code from large language models to perform an execution-guided search for correct programs within one prompt.
Abstract: Soundness is an important property in programming-by-example (PBE) as it allows synthesizers to perform a search over a domain-specific language (DSL) that terminates when any sound program is found.
Large language models (LLMs) can generate code from examples without being limited to a DSL, but they lack search, as samples are independent.
One can sampling code until a sound program is generated, but that is very inefficient.
In this paper, we use an LLM as a policy that generates lines of code and then join these lines of code to let the LLM implicitly estimate the value of each of these lines in its next iteration.
We further guide the policy and value estimation by executing each line and annotating it with its results on the given examples.
This allows us to search for programs within a single, expanding prompt until a sound program is found by letting the policy reason in both the syntactic (code) and semantic (execution) space.
We evaluate this approach on straight-line Python code generation using five benchmarks across different domains (string transformations, list transformations, and arbitrary Python programming problems).
We show that the model effectively uses the execution results to guide the search and that within-prompt search performs well at low token budgets.
We also analyze how the model behaves as a policy and value, show that it can parallelize the search, and that it can implicitly backtrack over earlier generations.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12254
Loading