Acting Beyond Learning: Imagination-Assisted Decision-Making in the Visual-based Multi-Agent Cooperative Scenarios

Published: 01 Jan 2025, Last Modified: 16 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Learning optimal policies in multi-agent cooperative settings with visual observations is significant and challenging. Agents must first perform state representation learning for their image observations and then learn policies in the abstracted state space. Aiming at this problem, we propose a novel model-based MARL method named Contrastive Latent World for Policy Optimization (CLWPO). In CLWPO, we first design a state representation model to facilitate learning in the latent state space. With the support of this model, we construct the latent world and introduce a contrastive variational bound (CVB) to optimize it. Subsequently, we develop a heuristic policy optimization (HPO) scheme, incorporating model-free learning with model-based planning to obtain robust policies that predict future behaviors. In particular, in the planning, we maintain a queue of teammate models and calculate an adaptive rollout length for each agent to support their self-imagination and reduce the model-based return discrepancy. Finally, we conducted extensive experiments in the PettingZoo benchmark, and results show that CLWPO significantly enhances learning efficiency and improves agent performance compared to state-of-the-art MARL methods.
Loading