Power Mean Estimation in Stochastic Monte-Carlo Tree Search

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Monte-Carlo Tree Search, Monte-Carlo Planning
TL;DR: A novel MCTS algorithm for Stochastic MDPs, employing the optimism under uncertainty principle akin to UCT, and demonstrates its polynomial concentration of value estimation at the root node, complemented by empirical validation in stochastic tasks.
Abstract: Monte-Carlo Tree Search (MCTS) is a widely-used online planning strategy that effectively combines Monte-Carlo sampling with forward tree search to make optimal decisions in various real-world scenarios. Its success hinges on the application of the Upper Confidence bound for Trees (UCT) algorithm, an extension of the Upper Confidence bound (UCB) method used in multi-arm bandits. However, theoretical investigations of UCT have been found incomplete due to an error in the estimated "logarithmic" bonus term used for action selection in the tree. This issue was addressed by introducing a "polynomial" exploration bonus, which effectively balances the exploration-exploitation trade-off. However, most theoretical studies have focused on deterministic Markov Decision Processes (MDPs), neglecting stochastic settings. In this paper, we introduce Stochastic-Power-UCT, an MCTS algorithm explicitly designed for stochastic MDPs using power mean as the value estimator. We conduct a comprehensive study of the polynomial convergence assurance for selecting the optimal action and estimating the value at the root node within our methodology. Our findings demonstrate that Stochastic-Power-UCT shares the same convergence rate as UCT, with UCT being a special case of Stochastic-Power-UCT. Furthermore, we validate our theoretical results through empirical assessments across diverse stochastic MDP environments, providing empirical evidence to support our method's theoretical claims
List Of Authors: Tuan, Dam and Odalric-Ambrym Maillard and Emilie Kaufmann
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 607
Loading