GraphPlanner: Graph-Based Agentic Routing for LLMs

GraphPlanner: Graph-Based Agentic Routing for LLMs

ICLR 2026 Conference Submission20197 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic LLM, Memory utilization, Heterogeneous agents

TL;DR: GraphPlanner is a graph-based framework that enables agentic LLM routing by modeling cooperation and memory with reinforcement learning, achieving scalable, efficient, and generalizable routing.

Abstract: LLM routing has achieved promising results in integrating the strengths of di- verse models while balancing efficiency and performance. However, to support more realistic and challenging applications, routing must extend into agentic LLM settings—where task planning, multi-round cooperation among heterogeneous agents, and memory utilization are indispensable. To address this gap, we pro- pose GraphPlanner, a heterogeneous graph-based agentic router that generates routing workflows for each query and supports both inductive and transductive inference. GraphPlanner formulates workflow generation as a Markov Deci- sion Process (MDP), where at each step it selects both the LLM backbone and the agent role (Planner, Executor, Summarizer). By leveraging a heterogeneous graph, denoted as GARNet, to capture interactions among queries, agents, and responses, GraphPlanner integrates historical and contextual information into richer state representations. The entire pipeline is optimized with reinforcement learning, jointly improving task-specific performance and computational efficiency. We evaluate GraphPlanner across 14 diverse LLM tasks and demonstrate that: (1) GraphPlanner outperforms strong single- and multi-round routers, improv- ing accuracy by up to 9.3% while reducing GPU cost from 186.26 GiB to 1.04 GiB; (2) GraphPlanner generalizes robustly to unseen tasks and LLMs, exhibiting strong zero-shot capabilities; and (3) GraphPlanner effectively leverages his- torical interactions, supporting both inductive and transductive inference for more adaptive routing.

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 20197

Loading