- TL;DR: We developed an effective parallel UCT algorithm that achieves linear speedup and suffers negligible performance loss.
- Abstract: Monte Carlo Tree Search (MCTS) algorithms have achieved great success on many challenging benchmarks (e.g., Computer Go). However, they generally require a large number of rollouts, making their applications to planning costly. Furthermore, it is also extremely challenging to parallelize MCTS due to its inherent sequential nature: each rollout heavily relies on the statistics (e.g., node visitation counts) estimated from previous simulations to achieve an effective exploration-exploitation tradeoff. In spite of these difficulties, we develop an algorithm, P-UCT, to effectively parallelize MCTS, which achieves linear speedup and exhibits negligible performance loss with an increasing number of workers. The key idea in P-UCT is a set of statistics that we introduce to track the number of on-going yet incomplete simulation queries (named as unobserved samples). These statistics are used to modify the UCT tree policy in the selection steps in a principled manner to retain effective exploration-exploitation tradeoff when we parallelize the most time-consuming expansion and simulation steps. Experimental results on a proprietary benchmark and the public Atari Game benchmark demonstrate the near-optimal linear speedup and the superior performance of P-UCT when compared to these existing techniques.
- Keywords: parallel Monte Carlo Tree Search (MCTS), Upper Confidence bound for Trees (UCT), Reinforcement Learning (RL)
- Original Pdf: pdf