Keywords: Offline Reinforcement Learning, Multi-Agent System, Large Language Model, Sequential Decision-Making
Abstract: Large language model (LLM)-based multi-agent systems have shown promise for collaborative problem solving, yet maintaining coordination remains challenging. In interactive environments, agents only observe part of the task state, and locally reasonable decisions can gradually lead to misalignment over time, which we refer to as $\textit{coordination drift}$. To address this problem, we propose $\textbf{DriCo}$, a $\textbf{dri}$ft-aware $\textbf{co}$ordination framework for LLM-based multi-agent systems. DriCo introduces a coordinator that constructs a shared context from agent-level information and guides team-level decision-making. Each agent follows an LLM-based hierarchical policy composed of a planner for sub-goal generation and an actor for low-level Q-guided execution. We formulate coordination as a planning-step sequential process and optimize the coordinator with a drift-derived preference-based objective that favors shared contexts, reducing coordination drift and supporting stable long-horizon execution.
We further introduce $\textbf{LLM-Overcooked}$, an LLM-oriented extension of Overcooked-AI with separate training and evaluation environments, diverse layouts and recipes, and held-out configurations for evaluating in-layout coordination generalization to held-out recipe compositions.
Experiments show that DriCo improves coordination stability and reduces coordination drift.
Submission Number: 148
Loading