AutoPDL: Automatic Prompt Optimization for LLM Agents

Claudio Spiess; Mandana Vaziri; Louis Mandel; Martin Hirzel

AutoPDL: Automatic Prompt Optimization for LLM Agents

Claudio Spiess, Mandana Vaziri, Louis Mandel, Martin Hirzel

Published: 03 Jun 2025, Last Modified: 23 Jun 2025AutoML 2025 Methods TrackEveryoneRevisionsBibTeXCC BY 4.0

Confirmation: our paper adheres to reproducibility best practices. In particular, we confirm that all important details required to reproduce results are described in the paper,, the authors agree to the paper being made available online through OpenReview under a CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0/), and, the authors have read and commit to adhering to the AutoML 2025 Code of Conduct (https://2025.automl.cc/code-of-conduct/).

Reproducibility: zip

TL;DR: We optimize agentic and non-agentic prompt programs using an AutoML approach and find significant improvements in some settings.

Abstract: The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and non-transferable across LLMs or tasks. Therefore, this paper proposes AutoPDL, an automated approach to discover good LLM agent configurations. Our method frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and six LLMs (ranging from 3B to 70B parameters) show consistent accuracy gains ($9.06\pm15.3$ percentage points), up to 68.9pp, and reveal that selected prompting strategies vary across models and tasks.

Submission Number: 17

Loading