Net2Net Extension for the AlphaGo Zero AlgorithmOpen Website

2019 (modified: 28 Sept 2021)ACG 2019Readers: Everyone
Abstract: The number of residual network blocks in a computer Go program following the AlphaGo Zero algorithm is one of the key factors to the program’s playing strength. In this paper, we propose a method to deepen the residual network without reducing performance. Next, as self-play tends to be the most time-consuming part of AlphaGo Zero training, we demonstrate how it is possible to continue training on this deepened residual network using the self-play records generated by the original network (for time saving). The deepening process is performed by inserting new layers into the original network. We present in this paper three insertion schemes based on the concept behind Net2Net. Lastly, of the many different ways to sample the previously generated self-play records, we propose two methods so that the deepened network can continue the training process. In our experiment on the extension from 20 residual blocks to 40 residual blocks for $$9 \times 9$$ Go, the results show that the best performing extension scheme is able to obtain 61.69% win rate against the unextended player (20 blocks) while greatly saving the time for self-play.
0 Replies

Loading