Keywords: Machine Learning, AlphaZero, Information Theory, Inductive Bias, MCTS, Monte Carlo Tree Search
Abstract: The AlphaZero algorithm has achieved remarkable success in a variety of sequential, perfect information games including Go, Shogi and Chess. In the original paper the only hyperparameter that is changed from game to game is the $\alpha$ parameter governing a search prior. In this paper we investigate the properties of this hyperparameter. First, we build a formal intuition for its behavior on a toy example meant to isolate the influence of $\alpha$. Then, by comparing performance of AlphaZero agents with different $\alpha$ values on Connect 4, we show that the performance of AlphaZero improves considerably with a good choice of $\alpha$. This all highlights the importance of $\alpha$ as an interpretable hyperparameter which allows for cross-game tuning that more opaque hyperparameters like model architecture may not.
One-sentence Summary: In this paper we demonstrate the value of the interpretability and flexibility of a hyperparameter governing exploration noise in the Monte Carlo Tree Search of AlphaZero.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=cLIf0mn5kQ
6 Replies
Loading