MARRGM: Learning Framework for Multi-Agent Reinforcement Learning via Reinforcement Recommendation and Group Modification

Peiliang Wu; Liqiang Tian; Qian Zhang; Bingyi Mao; Wenbai Chen

MARRGM: Learning Framework for Multi-Agent Reinforcement Learning via Reinforcement Recommendation and Group Modification

Peiliang Wu, Liqiang Tian, Qian Zhang, Bingyi Mao, Wenbai Chen

Published: 01 Jan 2024, Last Modified: 11 Nov 2024IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Sample usage efficiency is an important factor affecting the convergence speed of multi-agent deep reinforcement learning (MADRL) algorithms. Most existing experience replay (ER) methods manually select experience samples to update the agent's policy. It is difficult to give suitable and efficient experience samples for different stages of agent policy learning as well as to effectively mine the potential value of experience samples in the replay buffer. Inspired by the idea of recommendation systems, this paper proposes a MADRL framework based on reinforcement recommendation and group modification to improve sample use efficiency and the ability to find the optimal solution of the multi-agent system in different task scenario categories. First, we use the sampling probability of each experience sample output from the recommendation network to recommend sampling instead of manual sampling; simultaneously, we collect the performance of the multi-agent system after updating the policy with the experience sample of recommendation sampling and construct the reinforcement learning process of the recommendation network. Next, we modify the individual policy of the agent according to the group rewards to improve the agent's ability to learn the optimal solution. We then combine and embed the reinforcement recommendation and group modification modules into the MADRL algorithm MAAC. Finally, we experiment with task scenarios, including cooperative collection, command movement, and target navigation, and extend this framework to the MADDPG algorithm to verify its scalability. The experimental results show that the off-policy MADRL algorithms combined with the proposed framework outperform the baseline algorithm in terms of sample usage efficiency and have better universality for the number of agents and scene categories.

Loading