Distributional Monte-Carlo Tree Search with Thompson Sampling in Stochastic Environments

Distributional Monte-Carlo Tree Search with Thompson Sampling in Stochastic Environments

ICLR 2026 Conference Submission19250 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Monte-Carlo Tree Search, Planning under Uncertainty

Abstract: We focus on a class of reinforcement learning algorithms, Monte-Carlo Tree Search (MCTS), in stochastic settings. MCTS has excelled in deterministic domains but can struggle in highly stochastic scenarios where transition randomness and partial observability lead to underexploration and suboptimal value estimates. To address these challenges, we integrate \emph{distributional} Reinforcement Learning (RL) with Thompson Sampling and an optimistic exploration bonus, resulting in two novel \emph{distributional MCTS} algorithms: CATSO (Categorical Thompson Sampling with Optimistic Bonus) and PATSO (Particle Thompson Sampling with Optimistic Bonus). In both methods, each Q-node in the search tree maintains a distribution of returns---via either a fixed set of categorical atoms (CATSO) or a dynamic set of particles (PATSO). We then employ Thompson Sampling plus a polynomial optimism bonus to drive exploration in stochastic environments. Theoretically, we show that both algorithms attain a non-asymptotic, problem-dependent simple regret bound of (\mathcal{O}(n^{-1/2})). Empirical evaluations confirm that our distributional approach significantly improves performance over existing baselines, demonstrating its potential for robust online planning under uncertainty.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 19250

Loading