Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE$^2$ algorithm. In SMPE$^2$, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE$^2$ outperforms a plethora of state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.
Lay Summary: Cooperating in complex environments is tough for AI agents, especially when they can’t see the whole picture or talk to each other. This paper tackles that challenge by helping agents better understand their surroundings using only what they individually observe. The key idea is to create a smarter way for each agent to guess what’s going on in the environment and to use that guess to make better decisions—both for exploring and working together with others. We introduce a new approach called SMPE², which gives agents two big advantages. First, it helps them build better internal representations (or "beliefs") about the world. Second, it trains them to explore in a way that helps both themselves and their teammates discover useful parts of the environment. Our method makes agents not just smarter individually, but better at teamwork. Tests on standard AI cooperation tasks show that SMPE² beats existing top-performing methods, especially in challenging, fully cooperative scenarios.
Link To Code: https://github.com/ddaedalus/smpe
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Cooperative Multi-Agent Reinforcement Learning, State Modelling, Sparse Reward
Submission Number: 9868
Loading