Keywords: Monte-Carlo Tree Search; Continuous Reinforcement Learning Planning
Abstract: We study continuous-action Monte Carlo Tree Search (MCTS) in a $d$-dimensional action space when the
optimal action-value function $Q^*(s,\cdot)$ is $\beta$-Hölder continuous with constant~$L$. We show that a
dimension-adaptive $\varepsilon$-net schedule combined with power-mean backups and a polynomial exploration
bonus finds an $\varepsilon$-optimal action in $ \tilde{O}\left(\sigma^2 L^{d/\beta} \varepsilon^{-(d/\beta+2)}\right) $
simulations, matching standard continuum-armed lower bounds up to logs while remaining practical
via on-demand, capped random nets. We further demonstrate that our method significantly outperforms
baseline methods on continuous control planning problems. Our work bridges the gap between theoretical
reinforcement learning and practical planning algorithms, providing a principled approach to
high-dimensional continuous action space exploration.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19231
Loading