StarCraft II Unplugged: Large Scale Offline Reinforcement Learning

Michael Mathieu; Sherjil Ozair; Srivatsan Srinivasan; Caglar Gulcehre; Shangtong Zhang; Ray Jiang; Tom Le Paine; Konrad Zolna; Richard Powell; Julian Schrittwieser; David Choi; Petko Georgiev; Daniel Kenji Toyama; Aja Huang; Roman Ring; Igor Babuschkin; Timo Ewalds; Mahyar Bordbar; Sarah Henderson; Sergio Gómez Colmenarejo; Aaron van den Oord; Wojciech M. Czarnecki; Nando de Freitas; Oriol Vinyals

StarCraft II Unplugged: Large Scale Offline Reinforcement Learning

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: offline rl, starcraft, deep rl, benchmark

TL;DR: We introduce a benchmark for offline RL on StarCraft II, and propose baselines and evaluation protocols.

Abstract: StarCraft II is one of the most challenging reinforcement learning (RL) environments; it is partially observable, stochastic, and multi-agent, and mastering StarCraft II requires strategic planning over long-time horizons with real-time low-level execution. It also has an active human competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because a massive dataset of millions of StarCraft II games played by human players has been released by Blizzard. This paper leverages that and establishes a benchmark, which we call StarCraft II Unplugged, that introduces unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard’s release), tools standardising an API for ML methods, and an evaluation protocol. We also present baseline agents, including behaviour cloning, and offline variants of V-trace actor-critic and MuZero. We find that the variants of those algorithms with behaviour value estimation and single step policy improvement work best and exceed 90% win rate against previously published AlphaStar behaviour cloning agents.

Supplementary Material: zip

0 Replies

Loading