Language Models For PDDL Planning: Generating Sound and Programmatic Policies

Published: 21 Jun 2025, Last Modified: 25 Jul 2025RLC 2025 Workshop PRLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: planning, PDDL, language models, programs, value functions, policies
Abstract: We study the usage of language models (LMs) for planning over world models specified in the Planning Domain Definition Language (PDDL). Prior work have shown that LMs cannot plan autonomously in a manner that is sound with respect to an input PDDL model. Thus, researchers have proposed to combine LMs with existing PDDL planners and external verifiers. In this work, we provide LMs with the PDDL domain and two example problems to generate Python programs that serve as generalised policies. Such generalised policies can solve unseen PDDL problems and are sound relative to the PDDL input *without reliance on existing PDDL planners or external verifiers*. We conduct experiments on recent competition benchmarks which show that our policies solve more problems than state-of-the-art PDDL planners and recent LM approaches. We further analyse the statistical correlation between validation and test metrics for selecting the best model configurations for test-time evaluation. Our approach manifests in the LMPlan planner which can solve planning problems with several hundreds of relevant objects. Surprisingly, we observe that LMs used in our framework sometimes plan more effectively over PDDL problems written in meaningless symbols in place of natural language; e.g. rewriting `(at dog kitchen)` as `(predicate2 obj1 obj3)`. This finding challenges previous hypotheses that LMs primarily reason over word semantics and memorise solutions from its training corpus.
Format: We have read the camera-ready instructions, and our paper is formatted with the provided template.
De-Anonymization: This submission has been de-anonymized.
Presenter: ~Dillon_Ze_Chen1
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 5
Loading