Multi-robot Cooperation Learning Based on Univariate Search Technique and Deep Deterministic Policy Gradient

Chuxi Xiao, Chenheng Zhang, Xian Guo

Published: 01 Jan 2022, Last Modified: 15 May 2025ACIRS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Directly applying the method of single-agent re-inforcement learning(SARL) to multi-agent tasks is difficult, because it is often hard to achieve good cooperation results. In this paper, a novel training framework(called Uni-DDPG) based on Univariate Search Technique and Deep Deterministic Policy Gradient(DDPG) is proposed which can learn the cooperation between two agents. Specifically, the policy of one agent is fixed, and DDPG is used to learn the policy of the other agent, then the learned policy is fixed and the previously fixed policy is optimized with DDPG, so on and so on. In addition, in order to improve the robustness and efficiency of learning, Win-or-Learn-Fast method is adopted to update parameters. Finally, a two-robot cooperation task is built and the proposed method is utilized to learn the optimal policies. Results show that the two robots can learn cooperation with the proposed method which outperforms independent DDPG largely.