Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning

TMLR Paper2719 Authors

19 May 2024 (modified: 23 May 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the complicated environment dynamics. Our model produces imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. Experimental results on StarCraft II micro-management, Multi-Agent MuJoCo, and Level-Based Foraging challenges demonstrate that our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Mingsheng_Long2
Submission Number: 2719
Loading