Keywords: Scaling Environments, Multi-Agent Reinforcement Learning (MARL), Multi-Agent Coordination, Emergent Communication, Intention Communication, Policy Learning through Interaction, Generalization
TL;DR: This paper demonstrates that for multi-agent coordination in scaled environments, a structured protocol where agents communicate explicit plans is significantly more robust and scalable than a purely emergent protocol learned from scratch.
Abstract: The scaling of environmental complexity is a critical frontier for advancing multi-agent intelligence. As environments grow in size, dimensionality, and partial observability, agents require sophisticated coordination mechanisms to maintain performance. This paper investigates the role of communication in such scaled environments by comparing two distinct strategies in a cooperative multi-agent reinforcement learning (MARL) task: an emergent protocol and an engineered, intention-based protocol. For the emergent approach, we introduce Learned Direct Communication (LDC), where agents equipped with distinct neural network weights learn to generate and interpret messages end-to-end. For the engineered approach, we propose Intention Communication, a structured architecture featuring an Imagined Trajectory Generation Module (ITGM) and a Message Generation Network (MGN) that allows agents to explicitly formulate and share forward-looking plans. The ITGM uses an internal world model and the agent's own policy to generate and share *planned* future trajectories. We evaluate these strategies in a partially observable grid world, progressively scaling the environment's size. Our findings reveal that while emergent communication is viable in simpler settings, its performance degrades sharply as the environment scales. In contrast, the engineered Intention Communication approach demonstrates remarkable robustness, sample efficiency, and high performance, maintaining near-optimal success rates even in significantly larger and more challenging environments. This work underscores that for agents to succeed in increasingly complex, scaled-up interactive settings, structured and explicit coordination mechanisms may be fundamentally more scalable than purely emergent protocols.
Archival Option: The authors of this submission want it to appear in the archival proceedings.
Submission Number: 20
Loading