Keywords: multi-agent system, multi-agent Reinforcement learning, communication
Abstract: It has long been recognized that cooperative multi-agent reinforcement learning (MARL) faces scalability challenges, as the state-action space grows exponentially with the number of agents. Existing approaches typically improve scalability by filtering out communication with low-relevance agents. However, such filtering often relies on global state and fixed graph distribution, limiting adaptability in large-scale, communication-constrained environments. In this paper, we propose a scalable MARL method, called Adaptive Communication Range PPO (ACR-PPO), that models communication-based decision-making under communication budget constraints as a sequential process: a communication policy first selects each agent’s communication range within a given budget, followed by a behavior policy that chooses actions based on the included neighbors. More importantly, we provide a theoretical guarantee of monotonic performance improvement under communication budget constraints. Experimental results across diverse scenarios show that our approach preserves policy performance while significantly reducing communication cost through adaptive range control.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 23951
Loading