Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: automated planning, planning, heuristic search, search
TL;DR: We use LLMs to create state-of-the-art AI planners.
Abstract: In recent years, large language models (LLMs) have shown remarkable performance in many problems. However, they fail to plan reliably. Specialized attempts to improve their planning capabilities still produce incorrect plans and fail to generalize to larger tasks. Furthermore, LLMs designed for explicit "reasoning" fail to compete with automated planners while increasing computational costs, which reduces one of the advantages of using LLMs. In this paper, we show how to use LLMs to always generate correct plans, even for out-of-distribution tasks of increasing size. For a given planning domain, we ask an LLM to generate several domain-dependent heuristic functions in the form of Python code, evaluate them on a set of training tasks with a greedy best-first search, and choose the best one. The resulting LLM-generated heuristic functions solve substantially more unseen out-of-distribution test tasks than end-to-end LLM planning, particularly for non-reasoning LLMs. Moreover, they also solve many more tasks than state-of-the-art domain-independent heuristics for classical planning, and are competitive with the strongest learning algorithm for domain-dependent planning. These results are impressive given that our implementation is based on a Python planner and the baselines all build upon highly optimized C++ code. In some domains, the LLM-generated heuristics expand fewer states than the baselines, showing that they are not only efficiently computable but also more informative than the state-of-the-art heuristics. Overall, our results show that sampling a set of planning heuristic functions can significantly improve the planning capabilities of LLMs.
Supplementary Material: zip
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 10766
Loading