Keywords: Monte Carlo Tree Search, Doubly Robust Estimation, Off-Policy Evaluation, Decision-Making
TL;DR: We introduce Doubly Robust Monte Carlo Tree Search, a novel algorithm that combines MCTS with doubly robust off-policy estimation to achieve superior performance and estimation efficiency in complex decision-making environments.
Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates doubly robust off-policy estimation into MCTS to improve sample efficiency in computationally expensive environments. Our approach employs an adaptive hybrid estimator that dynamically balances Monte Carlo rollouts with doubly robust estimation through variance-minimizing weights computed online from empirical statistics. We provide theoretical guarantees for unbiasedness and establish conditions for variance reduction. Empirically, DR-MCTS shows consistent improvements across diverse domains: competitive game playing (9×9 Go), mathematical reasoning (GSM8K), and embodied planning (VirtualHome). While providing modest gains in traditional domains, DR-MCTS excels in LLM-augmented environments, achieving 3× higher success rates than standard MCTS on complex compositional tasks while reducing computational costs by over 50\%. Notably, entropy-based methods (MENTS, BTS, DENTS) fail to complete tasks within the same computational budgets. These results highlight how variance reduction becomes increasingly valuable when simulations involve expensive language model queries, making DR-MCTS particularly suited for the growing class of LLM-guided planning applications.
Primary Area: reinforcement learning
Submission Number: 14644
Loading