Policy Consensus-Based Distributed Deterministic Multi-Agent Reinforcement Learning Over Directed Graphs

Yifan Hu, Junjie Fu, Guanghui Wen, Changyin Sun

Published: 2025, Last Modified: 04 Nov 2025IEEE Trans. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Learning efficient coordination policies over continuous state and action spaces remains a huge challenge for existing distributed multi-agent reinforcement learning (MARL) algorithms. In this article, the classic deterministic policy gradient (DPG) method is extended to the distributed MARL domain to handle the continuous control policy learning issue for a team of homogeneous agents connected through a directed graph. A theoretical on-policy distributed actor–critic algorithm is first proposed based on a local DPG theorem, which considers observation-based policies, and incorporates consensus updates for the critic and actor parameters. Stochastic approximation theory is then used to obtain asymptotic convergence results of the algorithm under standard assumptions. Thereafter, a practical distributed deterministic actor–critic algorithm is proposed by integrating the theoretical algorithm with the deep reinforcement learning training architecture, which achieves better scalability, exploration ability, and data efficiency. Simulations are carried out in standard MARL environments with continuous action spaces, where the results demonstrate that the proposed distributed algorithm achieves comparable learning performance to solid centralized trained baselines while demanding much less communication resources.