Keywords: Multi-agent Reinforcement Learning, Represention Learning, Reinforcement Learning
Abstract: Vision-based multi-agent reinforcement learning (MARL) suffers from poor sample efficiency, limiting its practicality in real-world systems. Representation learning with auxiliary tasks can enhance efficiency; however, existing methods, including contrastive learning, often require the careful design of a similarity function and increase architectural complexity. In contrast, reconstruction-based methods that utilize autoencoders are simple and effective for representation learning, yet remain underexplored in MARL. We revisit this direction and identify unstable representation updates as a key challenge that limits its sample efficiency and stability in MARL. To address this challenge, we propose the Multi-agent Trust Region Variational Autoencoder (MA-TRVAE), which stabilizes latent representations by constraining updates within a trust region. Combined with a state-of-the-art MARL algorithm, MA-TRVAE improves sample efficiency, stability, and scalability in vision-based multi-agent control tasks. Experiments demonstrate that this simple approach not only outperforms prior vision-based MARL methods but also MARL algorithms trained with proprioceptive state. Furthermore, our method can scale up to more agents with only slight performance degradation, while being more computationally efficient than the underlying MARL algorithm.
Primary Area: reinforcement learning
Submission Number: 18434
Loading