Abstract: Certain strong LLMs have shown promise for zero-shot formal planning by generating planning languages like PDDL. Yet, performance of most open-source models under 100B parameters has been reported to be close to zero due to the low-resource nature of these languages. We significantly improve their performance via a series of lightweight pipelines that integrates documentation retrieval with modular code generation and error refinement. With models like Llama-4-Maverick, our best pipeline improves plan correctness from 0\% to over 80\% on the common BlocksWorld domain. However, while syntactic errors are substantially reduced, semantic errors persist in more challenging domains, revealing fundamental limitations in current models' reasoning capabilities.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: formal planning, low-resource language, domain specific language, code generation, retrieval-augmented generation
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English, PDDL
Submission Number: 381
Loading