Keywords: Uncertainty-aware planning, Monte Carlo Tree Search (MCTS), Generative world models, Vision–language models, Symbolic abstractions for long-horizon robotics
TL;DR: We propose an uncertainty-aware Monte Carlo Tree Search framework that integrates generative world models, vision–language progress signals, and multimodal LLM action priors to improve long-horizon robot planning.
Abstract: Robots acting in household environments must learn
to plan long-horizon tasks in the presence of perceptual un-
certainty, sparse rewards, and imperfect models of dynamics.
While Monte Carlo Tree Search (MCTS) is a powerful tool for
sequential decision making, its classical assumptions of an accu-
rate simulator and well-shaped rewards do not hold in realistic
robotic settings. In this work, we present an uncertainty-aware
MCTS framework that combines a learned generative world
model for imagined rollouts, a vision–language model (VLM)
for progress-based shaping, and multimodal LLM (M-LLM)
action priors. A hybrid upper confidence bound (UCB) integrates
uncertainty from the world model, the VLM scorer, and the prior
policy to balance exploration and risk aversion. In AI2-THOR
long-horizon household tasks (15–25 steps), preliminary results
suggest promising trends in success rate and planning efficiency
compared to ablations (world-model only, shaping only, or priors
only). While these findings are limited to simulation and remain
to be validated more thoroughly, they illustrate a potential path
toward safer and more effective deployment of learned generative
models in robotics.
Submission Number: 13
Loading