Doubly Robust Monte Carlo Tree Search

Doubly Robust Monte Carlo Tree Search

ICLR 2026 Conference Submission14644 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Monte Carlo Tree Search, Doubly Robust Estimation, Off-Policy Evaluation, Decision-Making

TL;DR: We introduce Doubly Robust Monte Carlo Tree Search, a novel algorithm that combines MCTS with doubly robust off-policy estimation to achieve superior performance and estimation efficiency in complex decision-making environments.

Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates doubly robust off-policy estimation into MCTS to improve sample efficiency in computationally expensive environments. Our approach employs an adaptive hybrid estimator that dynamically balances Monte Carlo rollouts with doubly robust estimation through variance-minimizing weights computed online from empirical statistics. We provide theoretical guarantees for unbiasedness and establish conditions for variance reduction. Empirically, DR-MCTS shows consistent improvements across diverse domains: competitive game playing (9×9 Go), mathematical reasoning (GSM8K), and embodied planning (VirtualHome). While providing modest gains in traditional domains, DR-MCTS excels in LLM-augmented environments, achieving 3× higher success rates than standard MCTS on complex compositional tasks while reducing computational costs by over 50\%. Notably, entropy-based methods (MENTS, BTS, DENTS) fail to complete tasks within the same computational budgets. These results highlight how variance reduction becomes increasingly valuable when simulations involve expensive language model queries, making DR-MCTS particularly suited for the growing class of LLM-guided planning applications.

Primary Area: reinforcement learning

Submission Number: 14644

Loading