Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon; Hyeonseo Cho; Doojin Baek; Yoshua Bengio; Sungjin Ahn

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Monte Carlo Tree Diffusion (MCTD) merges MCTS with diffusion-based partial denoising, enabling superior long-horizon planning by adaptively exploring and refining trajectories.

Abstract: Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)—whose performance naturally improves with inference-time computation scaling—standard diffusion‐based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree‐structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long‐horizon tasks show that MCTD outperforms diffusion baselines, yielding higher‐quality solutions as inference-time computation increases.

Lay Summary: We introduce Monte Carlo Tree Diffusion (MCTD), a novel AI planning method that combines diffusion models (used for generating images and text) with Monte Carlo Tree Search (the algorithm behind AlphaGo). Unlike traditional diffusion approaches that generate entire action sequences at once, MCTD builds plans incrementally through tree-structured search, exploring different possibilities, evaluating outcomes, and refining promising paths. This enables more thoughtful, strategic planning. Testing on challenging tasks like robot navigation and object manipulation, MCTD significantly outperforms existing methods. Importantly, it scales well: given more computational time, it consistently finds better solutions, making it ideal for complex problems where quality matters. This work bridges generative AI with strategic reasoning, advancing how AI systems plan long-term actions.

Primary Area: Reinforcement Learning->Planning

Keywords: Diffusion, MCTS, Long-term Planning, Offline RL, Goal-conditioned RL, Inference-Time Scaling

Submission Number: 13917

Loading