Graph-Conditioned Diffusion for Offline Multi-Agent Reinforcement Learning
Keywords: Offline MARL, Diffusion Models, Multi-Agent Communication, Multi-Agent Generalization
TL;DR: GCD is a multi-agent diffusion framework that uses graph-based communication embeddings and diffusion-based generative modeling to learn decentralized, generalizable policies, achieving 7% to 221% win-rate improvements on unseen team compositions.
Abstract: Multi-agent reinforcement learning struggles with scalability and real-world applicability, as the high interaction variability across team compositions limits the effectiveness of online adaptive methods. Alternatively, offline RL can address these limitations by leveraging diverse offline data to facilitate learning across teams. However, existing offline RL methods fail to produce multi-agent policies that can both adapt using only offline data and coordinate effectively under decentralized execution. To address these challenges, we present Graph Conditioned Diffusion (GCD), a multi-agent diffusion framework that uses graph-based communication to learn generalizable policies offline and maintain decentralization during execution. Our framework leverages the conditional generative modeling ability of diffusion models to learn multi-modal distributions of trajectories across team compositions by conditioning on team communication embeddings. We then adapt coordination online through classifier-free guidance, which steers the generative process toward behaviors that generalize across team compositions. We evaluate our method on the StarCraft II Multi-Agent Challenge v2 (SMACv2) domain, demonstrating superior generalization with an average win-rate improvement of 7% to 221% in unseen team compositions compared to decentralized baselines.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1512
Loading