Multi-robot Cooperation Learning Based on Powell Deep Deterministic Policy Gradient

Zongyuan Li, Chuxi Xiao, Ziyi Liu, Xian Guo

Published: 2022, Last Modified: 15 May 2025ICIRA (2) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Model-free deep reinforcement learning algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods could not perform well in multi-agent environments due to the instability of teammates’ strategies. In this paper, a novel reinforcement learning method called Powell Deep Deterministic Policy Gradient (PDDPG) is proposed, which integrates Powell’s unconstrained optimization method and deep deterministic policy gradient. Specifically, each agent is regarded as a one-dimensional variable and the process of multi-robot cooperation learning is corresponding to optimal vector searching. A conjugate direction in Powell-method is constructed and is used to update the policies of agents. Finally, the proposed method is validated in a dogfight-like multi-agent environment. The results suggest that the proposed method outperforms much better than independent Deep Deterministic Policy Gradient (IDDPG), revealing a promising way in realizing high-quality independent learning.