TL;DR: We reproduced AlphaZero on Google Cloud Platform
Abstract: The reproducibility of reinforcement-learning research has been highlighted as a key challenge area in the field. In this paper, we present a case study in reproducing the results of one groundbreaking algorithm, AlphaZero, a reinforcement learning system that learns how to play Go at a superhuman level given only the rules of the game. We describe Minigo, a reproduction of the AlphaZero system using publicly available Google Cloud Platform infrastructure and Google Cloud TPUs. The Minigo system includes both the central reinforcement learning loop as well as auxiliary monitoring and evaluation infrastructure. With ten days of training from scratch on 800 Cloud TPUs, Minigo can play evenly against LeelaZero and ELF OpenGo, two of the strongest publicly available Go AIs. We discuss the difficulties of scaling a reinforcement learning system and the monitoring systems required to understand the complex interplay of hyperparameter configurations.
2 Replies
Loading