SHEP: Spatial Heterogeneity–Driven Experience Prioritization in Scalable Multi-Agent Reinforcement Learning

TMLR Paper6777 Authors

02 Dec 2025 (modified: 15 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Scalable Multi-Agent Reinforcement Learning (MARL) faces severe challenges regarding the exponential explosion of joint state-action space dimensionality and the difficulty of global coordination as the number of agents increases. Traditional methods optimize fine-grained individual strategies within an exponentially vast state space, leading to low sample efficiency and training bottlenecks in large-scale scenarios. To address these issues, this paper proposes \textbf{SHEP} (Spatial Heterogeneity–Driven Experience Prioritization), a mesoscopic guidance framework designed for large-scale group coordination. SHEP utilizes Occupancy Entropy, Action Diversity Entropy, and Moran's I to construct a set of topological feature descriptors, mapping the high-dimensional individual state space into a low-dimensional, interpretable group feature space. Building on this, we design heterogeneity-driven prioritized experience replay and Group Hindsight Experience Replay (Group-HER). By identifying critical moments of abrupt spatial heterogeneity changes or highly structured clustering, these mechanisms accurately screen for high-value samples and perform ``dimensionality reduction pruning'' on the ineffective exploration space, significantly improving sample efficiency. Due to the universality of its experience screening mechanism, SHEP can be seamlessly integrated as a ``plug-in'' into mainstream centralized training algorithms like MAPPO without altering their underlying policy optimization objectives. In MAgent environments and SMAC benchmarks, SHEP demonstrates superior performance, with convergence speed and final win rates significantly outperforming baseline methods such as QMIX and Mean-Field approaches. These results robustly validate that introducing explicit spatial heterogeneity features to guide experience prioritization is an effective paradigm for resolving the curse of dimensionality in scalable MARL.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhongwen_Xu1
Submission Number: 6777
Loading