CoPE: A Framework for Optimizing Coordination between Planning and Execution in LLM-based Agents

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fine-tuning Large Language Models (LLMs) as autonomous agents on domain-specific data has emerged as a promising paradigm for tackling interactive, real-world tasks. However, existing studies have overlooked the critical coordination between long-term planning and multi-step execution in optimizing agent capabilities. This oversight leads to the propagation of impractical plans and plan-deviated trajectories within the optimization process, resulting in suboptimal task performance and hindering the further development of LLM-based agents in long-horizon tasks. To bridge this gap, we propose $\textbf{CoPE}$, a novel framework that explicitly integrates planning–execution coordination into LLM-based agent optimization. CoPE employs Self-Refining MCTS to generate task plans and multiple execution trajectories through environment interactions. By quantifying the coordination between planning and execution, CoPE assigns higher optimization weights to well-coordinated samples, enabling LLM-based agents to learn better planning and execution policies. Extensive experiments demonstrate that CoPE substantially improves agent coordination, outperforming state-of-the-art baselines on benchmarks comprising two long-horizon multi-step tasks. Codes and data are available at https://github.com/Octobrist/CoPE.
Lay Summary: Large language models are increasingly being used as autonomous agents that can plan and act to solve complex, real-world problems over long periods. However, these agents often struggle because their long-term plans do not align well with their actual step-by-step actions. This mismatch leads to impractical strategies and poor performance, much like a traveler who draws a perfect map but gets lost at every turn. To solve this, we introduce CoPE, a new method that teaches AI agents to better coordinate their planning with their execution. Instead of treating planning and acting as independent components, CoPE encourages the agent to learn from experiences where its actions successfully match its intentions. By rewarding this alignment, the agent becomes much better at sticking to its plans and adapting when necessary. Our experiments show that this approach significantly improves the ability of AI agents to handle difficult, multi-step tasks.
Primary Area: Deep Learning->Large Language Models
Keywords: Task Planning, Coordination Optimization, Large Language Models, Autonomous Agents
Originally Submitted PDF: pdf
Submission Number: 9748
Loading