On Effective Parallelization of Monte Carlo Tree SearchDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: parallel Monte Carlo Tree Search (MCTS), Upper Confidence bound for Trees (UCT), Reinforcement Learning (RL)
Abstract: Despite its groundbreaking success in Go and computer games, Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree, which calls for effective parallelization. However, how to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. In this paper, we seek to lay its first theoretical foundation, by examining the potential performance loss caused by parallelization when achieving a desired speedup. In particular, we discover two necessary conditions for achieving a desirable parallelization performance and highlight two of their practical benefits. First, by examining whether existing parallel MCTS algorithms satisfy these conditions, we identify key design principles that should be inherited by future algorithms. Specifically, we find that, although no existing algorithm satisfies both conditions in general (i.e., they are still far from optimal), pre-adjusting the nodes visit counts (used in VL-UCT (Segal, 2010) and WU-UCT (Liu et al., 2020)) is essential. In particular, WU-UCT pre-adjusts visit counts in a way that always satisfies one necessary condition while its other designs meet the other condition when the maximum tree depth is 2. We theoretically establish that they together facilitate $\mathcal{O} ( \ln n + M / \sqrt{\ln n} )$ cumulative regret under the depth-2 setting, where $n$ is the number of rollouts and $M$ is the number of workers. A regret of this form is highly desirable, as compared to $\mathcal{O} ( \ln n )$ regret incurred by a sequential counterpart, its excess part approaches zero as $n$ increases. Second, and more importantly, we demonstrate how the proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms. To illustrate this, we propose a new parallel MCTS algorithm, called BU-UCT, by following our theoretical guidelines. The newly proposed algorithm, albeit preliminary, out-performs four competitive baselines on 11 out of 15 Atari games. We hope our theoretical results could inspire future work of more effective parallel MCTS.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=DgfSXBuy5N
11 Replies

Loading