Uncertainty-Aware Planning with Generative World- and Language-Models via Monte Carlo Tree Search

Published: 01 Feb 2026, Last Modified: 01 Feb 2026CoRL 2025 Workshop LEAP (Rolling)EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Uncertainty-aware planning, Monte Carlo Tree Search (MCTS), Generative world models, Vision–language models, Symbolic abstractions for long-horizon robotics
TL;DR: We propose an uncertainty-aware Monte Carlo Tree Search framework that integrates generative world models, vision–language progress signals, and multimodal LLM action priors to improve long-horizon robot planning.
Abstract: Robots acting in household environments must learn to plan long-horizon tasks in the presence of perceptual un- certainty, sparse rewards, and imperfect models of dynamics. While Monte Carlo Tree Search (MCTS) is a powerful tool for sequential decision making, its classical assumptions of an accu- rate simulator and well-shaped rewards do not hold in realistic robotic settings. In this work, we present an uncertainty-aware MCTS framework that combines a learned generative world model for imagined rollouts, a vision–language model (VLM) for progress-based shaping, and multimodal LLM (M-LLM) action priors. A hybrid upper confidence bound (UCB) integrates uncertainty from the world model, the VLM scorer, and the prior policy to balance exploration and risk aversion. In AI2-THOR long-horizon household tasks (15–25 steps), preliminary results suggest promising trends in success rate and planning efficiency compared to ablations (world-model only, shaping only, or priors only). While these findings are limited to simulation and remain to be validated more thoroughly, they illustrate a potential path toward safer and more effective deployment of learned generative models in robotics.
Submission Number: 13
Loading