Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Yang Zhang; Xinran Li; Jianing Ye; Shuang Qiu; Delin Qu; Xiu Li; Chongjie Zhang; Chenjia Bai

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Yang Zhang, Xinran Li, Jianing Ye, Shuang Qiu, Delin Qu, Xiu Li, Chongjie Zhang, Chenjia Bai

Published: 18 Sept 2025, Last Modified: 15 Dec 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Model-based Reinforcement Learning, World Models, Learning in Imaginations, Diffusion Models

TL;DR: We introduce a sample-efficient Diffusion-Inspired Multi-Agent world model (DIMA) for multi-agent control environments--MAMuJoCo and Bi-DexHands

Abstract: World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models—a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, \textbf{D}iffusion-\textbf{I}nspired \textbf{M}ulti-\textbf{A}gent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research. Codes are open-sourced at \url{https://github.com/breez3young/DIMA}.

Supplementary Material: zip

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 19853

Loading