Abstract: In multi-agent systems, deep reinforcement learning policy gradient algorithms can converge excessively slowly or even fail to converge if the agent size as well as the state information quickly grows. We consequently present a policy gradient algorithm for generalised centralised training and decentralised execution (CTDE) based on the principle of masking. We transform the global state information of the critic network in the original (MADDPG) algorithm to the state information of local random agents as the input of the critic network. In addition, we have changed the way Polyak updates the target network so that it can dynamically and adaptively update the target network. Under the new framework, our approach considerably decreases the training strain on the critic network while taking into consideration the efficiency of agent sample learning and speeding up the multi-agent discovery of superior strategies. Combining these two improvements, our suggested approaches can be extended to any other CTDE-based multi-agent deep reinforcement learning algorithms, rather than being limited to the MADDPG conventional multi-agent reinforcement learning algorithm. We made the code publicly available at https://github.com/ZVEzhangyu/SMPG-master.
Loading