Domain Knowledge in Exploration Noise in AlphaZero

Eric Weiner; George D Montañez; Aaron Trujillo; Abtin Molavi

Domain Knowledge in Exploration Noise in AlphaZero

Eric Weiner, George D Montañez, Aaron Trujillo, Abtin Molavi

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Machine Learning, AlphaZero, Information Theory, Inductive Bias, MCTS, Monte Carlo Tree Search

Abstract: The AlphaZero algorithm has achieved remarkable success in a variety of sequential, perfect information games including Go, Shogi and Chess. In the original paper the only hyperparameter that is changed from game to game is the $\alpha$ parameter governing a search prior. In this paper we investigate the properties of this hyperparameter. First, we build a formal intuition for its behavior on a toy example meant to isolate the influence of $\alpha$. Then, by comparing performance of AlphaZero agents with different $\alpha$ values on Connect 4, we show that the performance of AlphaZero improves considerably with a good choice of $\alpha$. This all highlights the importance of $\alpha$ as an interpretable hyperparameter which allows for cross-game tuning that more opaque hyperparameters like model architecture may not.

One-sentence Summary: In this paper we demonstrate the value of the interpretability and flexibility of a hyperparameter governing exploration noise in the Monte Carlo Tree Search of AlphaZero.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=cLIf0mn5kQ

6 Replies

Loading