Abstract: Scalability is a central challenge in multi-agent reinforcement learning (MARL), as real-world applications
often require coordination among tens to hundreds of agents. As multi-agent systems (MAS) scale up, their inherent difficulties——partial observability, non-stationarity, and complex inter-agent dependencies——become
increasingly pronounced. Existing approaches typically pursue scalability by encouraging grouped or hierarchical cooperation, but their limited flexibility—stemming from task-specific priors such as predefined role
structures, fixed sub-task horizons, or perceivable sub-task boundaries—makes their performance heavily dependent on carefully hand-crafted designs, thereby restricting their effectiveness in large-scale MAS. To address these
limitations, we propose Selective Attention–enhanced Multi-agent Policy Optimization (SAMPO), a concise yet
effective framework for scalable multi-agent policy learning. SAMPO leverages attention scores to reorder each
agent’s observations, thereby achieving permutation invariance in a simple manner and consequently reducing
the complexity of the observation space. This design substantially improves learning efficiency in cooperative
tasks involving up to hundreds of agents. Moreover, SAMPO introduces a selection mechanism [5], i.e., a module
that adaptively selects which interactions or entities to focus on, into the attention computation. This mechanism
dynamically determines the attention parameter matrices based on each agent’s internal state, thereby injecting nonlinearity and greatly enhancing the expressive capacity of attention encoding. By virtue of these designs,
SAMPO eliminates the need for extensive manual tuning and hand-crafted coordination structures, demonstrating
remarkable performance in large-scale multi-agent tasks. Empirical results show that, under a unified set of hyperparameters, SAMPO consistently outperforms state-of-the-art baselines across SMAC environments of varying
scales, including those involving up to hundreds of agents.
Loading