Symbolic Planning Using LLM Agents: A Cut-Based Reprompting Approach

Symbolic Planning Using LLM Agents: A Cut-Based Reprompting Approach

ICLR 2026 Conference Submission21547 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Symbolic planning, Context engineering, Sequential decision making

TL;DR: We propose cut-based reprompting, a simple validator-driven prompting method that prunes invalid paths, reduces entropy, and improves LLM planning reliability without modifying model internals.

Abstract: Large Language Models (LLMs) exhibit strong reasoning abilities but remain unreliable for long-horizon planning, particularly in structured environments where a single invalid action can derail the task. We study this problem through the lens of fault-tolerant symbolic planning, where the objective is to generate valid action sequences over structured state spaces. We introduce cut-based reprompting, a context-engineering technique in which symbolic constraints (“cuts”) are injected into the prompt to forbid invalid transitions and progressively refine the decision space. Unlike standard reprompting, which is stateless and prone to repeating errors, our method accumulates symbolic feedback and can be interpreted as entropy reduction in a stochastic policy over graphs, leading to improved convergence to valid plans. We operationalize this framework with a three-agent system (Action, Validation, Reprompting) and evaluate it on two settings: (i) a graph traversal benchmark of 250 tasks across directed acyclic graphs of varying complexity, and (ii) MiniGrid case studies (Empty Room, Key-Door Room) to assess generalization in embodied planning. Across three models (LLaMA3-8B, GPT-4o-mini), we measure success rate, null-path frequency, entropy dynamics, reprompt efficiency, and token cost. Our results show that cut-based reprompting substantially improves success over naive reprompting—for example, from 60.6% → 68.1% (+12%) on GPT-4o-mini and from 23.3% → 34.0% (+45%) on LLaMA3—while consistently reducing invalid transitions and stabilizing planning determinism. We also identify a novel failure mode in smaller models: early entropy collapse due to over-pruning, which highlights the tradeoff between constraint tightness and search diversity. This work contributes: (1) a systematic framework for symbolic, fault-tolerant planning with LLMs, and (2) cut-based reprompting as a general-purpose mechanism for embedding symbolic memory into stateless models. More broadly, our findings highlight the role of context as a controllable interface for reasoning, bridging symbolic AI and modern LLM planning.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 21547

Loading