Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang; Chenjia Bai; Bin Zhao; Junchi Yan; Xiu Li; Xuelong Li

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, world models, learning in imagination

TL;DR: We introduce the first Transformer-based multi-agent world model for sample-efficient multi-agent policy learning in its imaginations.

Abstract: Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Main results on Starcraft Multi-Agent Challenge (SMAC) and additional results on MAMujoco show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10860

Loading