Language Models Speed Up Local Search for Finding Programmatic Policies

Published: 11 Nov 2024, Last Modified: 11 Nov 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Encoding policies that solve sequential decision-making problems as programs offers advantages over neural representations, such as interpretability and modifiability of the policies. On the downside, programmatic policies are elusive because their generation requires one to search in spaces of programs that are often discontinuous. In this paper, we leverage the ability of large language models (LLMs) to write computer programs to speed up the synthesis of programmatic policies. We use an LLM to provide initial candidates for the policy, which are then improved by local search. Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We conjecture that LLMs are effective in this setting because we give them access to the outcomes of the policies rollouts. That way, LLMs can try policies encoding different behaviors, once they observe what a previous policy has accomplished. This process forces the search to explore different parts of the space through "exploratory initial programs". Experiments also show that much of the knowledge LLMs leverage comes from the domain-specific language that defines the search space - the overall performance of the system drops sharply if we change the name of the functions used in the language to meaningless names. Since our system only queries the LLM in the first step of the search, it offers an economical method for using LLMs to guide the synthesis of policies.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=1Fm54zGgid
Changes Since Last Submission: Dear Editors-in-Chief: Our previous submission was desk rejected because we inadvertently changed the font of the document. The problem is fixed and here we resubmit our work. We apologize for the oversight and the inconvenience this might have caused. Regards, Authors
Assigned Action Editor: ~Dennis_J._N._J._Soemers1
Submission Number: 3128
Loading