R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Improving multi-agent reinforcement learning by introducing a new information theoretic metric that enables diversity through world models
Abstract: Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent’s role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.
Lay Summary: Multi-agent reinforcement learning (MARL) is a type of artificial intelligence where multiple agents (like robots or self-driving cars) learn to work together to achieve shared goals. Inspired by how animals and humans naturally take on different roles to cooperate, researchers have developed methods for these AI agents to learn roles too. However, most existing approaches identify the roles for agents based on what they have done in the past. This paper introduces a new idea: the roles that agents take on should influence their future behavior so they can coordinate better as a team. We present a novel MARL method R3DM, which helps agents discover and take on roles that are not only based on their past experience but also designed to shape their future behavior. R3DM encourages agents to behave differently from each other. These behavioral differences enable agents to learn a broader set of specialized behaviors that help the team succeed. Evaluations on simulated multi-player games based on StarCraft show that R3DM helped agents coordinate much better, leading upto 20% more wins compared to previous methods.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Multi-agent Reinforcement Learning
Submission Number: 7359
Loading