Abstract: Action-value has been widely used in multi-agent reinforcement learning. However, action-value is hard to be adapted to scenarios such as real-time strategy games where the number of agents can vary from time to time. In this paper, we explore approaches of avoiding the action-value in systems in order to make multi-agent architectures more scalable. We present a general architecture for real-time strategy games and design the global reward function which can fit into it. In addition, in our architecture, we also propose the algorithm without human knowledge which can work for Semi Markov Decision Processes where rewards cannot be received until actions last for a while. To evaluate the performance of our approach, experiments with respect to micromanagement are carried out on a simplified real-time strategy game called MicroRTS. The result shows that the trained artificial intelligence is highly competitive against strong baseline robots.
Loading