Abstract: In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the complicated environment dynamics. Our model produces imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. Experimental results on StarCraft II micro-management, Multi-Agent MuJoCo, and Level-Based Foraging challenges demonstrate that our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: - **Section 3, Background**: Added a new subsection (Section 3.3) to introduce the concepts of structured variational inference and variational lower bound.
- **Section 4, Method**: Added sufficient analysis to clearly distinguish between our contributions and those adopted from the literature. Discussed the connections and differences between our work and important previous works in Sections 4.1.2 and 4.1.4.
- **Section 6, Conclusion**: Included the intuitive visualization of disentanglement learning as a future direction.
Assigned Action Editor: ~Mingsheng_Long2
Submission Number: 2719
Loading