Hierarchical Deep Reinforcement Learning Agent with Counter Self-play  on Competitive Games

Huazhe Xu; Keiran Paster; Qibin Chen; Haoran Tang; Pieter Abbeel; Trevor Darrell; Sergey Levine

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

Huazhe Xu, Keiran Paster, Qibin Chen, Haoran Tang, Pieter Abbeel, Trevor Darrell, Sergey Levine

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Deep Reinforcement Learning algorithms lead to agents that can solve difficult decision making problems in complex environments. However, many difficult multi-agent competitive games, especially real-time strategy games are still considered beyond the capability of current deep reinforcement learning algorithms, although there has been a recent effort to change this \citep{openai_2017_dota, vinyals_2017_starcraft}. Moreover, when the opponents in a competitive game are suboptimal, the current \textit{Nash Equilibrium} seeking, self-play algorithms are often unable to generalize their strategies to opponents that play strategies vastly different from their own. This suggests that a learning algorithm that is beyond conventional self-play is necessary. We develop Hierarchical Agent with Self-play (HASP), a learning approach for obtaining hierarchically structured policies that can achieve higher performance than conventional self-play on competitive games through the use of a diverse pool of sub-policies we get from Counter Self-Play (CSP). We demonstrate that the ensemble policy generated by HASP can achieve better performance while facing unseen opponents that use sub-optimal policies. On a motivating iterated Rock-Paper-Scissor game and a partially observable real-time strategic game (http://generals.io/), we are led to the conclusion that HASP can perform better than conventional self-play as well as achieve 77% win rate against FloBot, an open-source agent which has ranked at position number 2 on the online leaderboards.

Keywords: deep reinforcement learning, self-play, real-time strategic game, multi-agent

TL;DR: We develop Hierarchical Agent with Self-play (HASP), a learning approach for obtaining hierarchically structured policies that can achieve high performance than conventional self-play on competitive real-time strategic games.

4 Replies

Loading