Doubly Robust Monte Carlo Tree Search

ICLR 2026 Conference Submission14644 Authors

19 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Monte Carlo Tree Search, Doubly Robust Estimation, Off-Policy Evaluation, Decision-Making
TL;DR: We introduce Doubly Robust Monte Carlo Tree Search, a novel algorithm that combines MCTS with doubly robust off-policy estimation to achieve superior performance and estimation efficiency in complex decision-making environments.
Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), an algorithm that integrates doubly robust off-policy estimation into MCTS to improve sample efficiency. Our hybrid estimator combines Monte Carlo rollouts with DR estimation through a variance-minimizing weight computed online. Unlike biased bootstrapping methods that sacrifice asymptotic correctness, DR-MCTS achieves variance reduction while preserving unbiasedness. Unlike entropy-based approaches that exhibit domain-dependent performance, DR-MCTS demonstrates consistent improvements across diverse settings including game playing, mathematical reasoning, and embodied planning. The benefits are particularly pronounced in LLM-augmented environments where each simulation is computationally expensive, making DR-MCTS well-suited for the growing class of language-model-guided planning applications.
Primary Area: reinforcement learning
Submission Number: 14644
Loading