Dimension-Adaptive MCTS: Optimal Sample Complexity for Continuous Action Planning

ICLR 2026 Conference Submission19231 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Monte-Carlo Tree Search; Continuous Reinforcement Learning Planning
Abstract: We study continuous-action Monte Carlo Tree Search (MCTS) in a $d$-dimensional action space when the optimal action-value function $Q^*(s,\cdot)$ is $\beta$-Hölder continuous with constant~$L$. We show that a dimension-adaptive $\varepsilon$-net schedule combined with power-mean backups and a polynomial exploration bonus finds an $\varepsilon$-optimal action in $ \tilde{O}\left(\sigma^2 L^{d/\beta} \varepsilon^{-(d/\beta+2)}\right) $ simulations, matching standard continuum-armed lower bounds up to logs while remaining practical via on-demand, capped random nets. We further demonstrate that our method significantly outperforms baseline methods on continuous control planning problems. Our work bridges the gap between theoretical reinforcement learning and practical planning algorithms, providing a principled approach to high-dimensional continuous action space exploration.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19231
Loading