Keywords: offline rl, starcraft, deep rl, benchmark
TL;DR: We introduce a benchmark for offline RL on StarCraft II, and propose baselines and evaluation protocols.
Abstract: StarCraft II is one of the most challenging reinforcement learning (RL) environments; it is partially observable, stochastic, and multi-agent, and mastering StarCraft II requires strategic planning over long-time horizons with real-time low-level execution. It also has an active human competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because a massive dataset of millions of StarCraft II games played by human players has been released by Blizzard. This paper leverages that and establishes a benchmark, which we call StarCraft II Unplugged, that introduces unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard’s release), tools standardising an API for ML methods, and an evaluation protocol. We also present baseline agents, including behaviour cloning, and offline variants of V-trace actor-critic and MuZero. We find that the variants of those algorithms with behaviour value estimation and single step policy improvement work best and exceed 90% win rate against previously published AlphaStar behaviour cloning agents.
Supplementary Material: zip