Abstract: In complex multi-agent reinforcement learning environments, such as Starcraft II, most existing algorithms struggle to scale up to large-scale collaboration tasks. This is partly because learning to precisely control each agent to maximize its contribution to the team becomes increasingly difficult as the number of agents and the dimensionality of the input grow. In this paper, we propose the novel Pacesetter LeARning (PLAR) method, which builds and learns pacesetters to enable agents to consider the situation of neighboring agents and cooperate more effectively in complex, large-scale environments. By leveraging the comprehensive outlook provided by the pacesetters, agents are able to coordinate their actions more finely and achieve better overall performance. To demonstrate the effectiveness of our algorithm, we conduct experiments on both existing and newly generated scenarios. Specifically, we compare our algorithm to existing multi-agent algorithms on the existing scenarios, and we also optimize the StarCraft Multi-Agent Challenge (SMAC) from square complexity of memory to linear, enabling us to construct a larger-scale map for further evaluation. The experimental results demonstrate that PLAR outperforms existing algorithms in large-scale settings.
Loading