Scalable Multi-agent Reinforcement Learning Architecture for Semi-MDP Real-Time Strategy Games

Zhentao Wang; Weiwei Wu; Ziyao Huang

Scalable Multi-agent Reinforcement Learning Architecture for Semi-MDP Real-Time Strategy Games

Zhentao Wang, Weiwei Wu, Ziyao Huang

Published: 01 Jan 2020, Last Modified: 14 Nov 2024NCAA 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Action-value has been widely used in multi-agent reinforcement learning. However, action-value is hard to be adapted to scenarios such as real-time strategy games where the number of agents can vary from time to time. In this paper, we explore approaches of avoiding the action-value in systems in order to make multi-agent architectures more scalable. We present a general architecture for real-time strategy games and design the global reward function which can fit into it. In addition, in our architecture, we also propose the algorithm without human knowledge which can work for Semi Markov Decision Processes where rewards cannot be received until actions last for a while. To evaluate the performance of our approach, experiments with respect to micromanagement are carried out on a simplified real-time strategy game called MicroRTS. The result shows that the trained artificial intelligence is highly competitive against strong baseline robots.

Loading