Distributed Constrained Multi-Agent Reinforcement Learning with Consensus and Networked Communication

Santiago Amaya-Corredor; Miguel Calvo-Fullana; Anders Jonsson

Distributed Constrained Multi-Agent Reinforcement Learning with Consensus and Networked Communication

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Constrained Reinforcement Learning, Consensus, Distributed Optimisation, Networked Communication

TL;DR: Presenting a scalable, decentralized multi-agent reinforcement learning algorithm with consensus to address coordination and operational constraints in distributed systems

Abstract: Our research addresses scalability and coordination challenges inherent to distributed multi-agent systems (MAS) executing under operation constraints. We introduce a novel Constrained Multi-Agent Reinforcement Learning (CMARL) algorithm that integrates a consensus mechanism to ensure agent coordination. Our decentralized approach allows each agent to independently optimize its local rewards while adhering to global constraints evaluated via secondary rewards. These secondary rewards act as a coupling mechanism, penalizing non-cooperative behaviors. Agents operate within a communication network modeled as an undirected graph, exchanging information solely with immediate neighbors to dynamically update dual variables. Our algorithm is validated through its application to the economic dispatch problem within smart grid management, demonstrating its scalability and practical utility in optimizing energy distribution under operational constraints. Experimental results show that our method effectively balances the global and local objectives, proving its robustness in real-world, distributed settings. Key contributions of this work include: (i) the development of a CMARL algorithm that achieves long-term constraint satisfaction and agent consensus, (ii) an enhanced scalability of policy training through problem factorization based on observed state distributions, and (iii) the successful application of our algorithm in a smart grid management use case, highlighting its practical applicability and effectiveness in managing distributed energy resources.

Submission Number: 73

Loading