Abstract: In recent work on AI planning, Large Language Models (LLMs) are either used as planners to generate executable plans, or as formalizers to represent the planning domain and problem in formal language that can derive plans deterministically. However, both lines of work rely on standard benchmarks that only include generic and simplistic environmental specifications, leaving the robustness of LLMs' planning ability understudied. We bridge this gap by augmenting widely used planning domains with manually annotated, fine-grained, and rich natural language constraints spanning five distinct categories. Our experiments show that introducing constraints significantly decreases performance across all methods, and that the two methodologies each excel on different types of constraints.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: planning, constraints, code generation, large language models, formal reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: english
Submission Number: 228
Loading