Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control
Keywords: World Model for Robot Learning, Observation Intervention, Visual Model Predictive Control
Abstract: World models enable robots to “imagine” future
observations given current observations and planned actions, and
have been increasingly adopted as generalized dynamics models
to facilitate robot learning. Despite their promise, these models
remain brittle when encountering novel visual distractors such
as objects and background elements rarely seen during training.
Specifically, novel distractors can corrupt action outcome predic-
tions, causing downstream failures when robots rely on the world
model imaginations for planning or action verification. In this
work, we propose Reimagination with Observation Intervention
(ReOI), a simple yet effective test-time strategy that enables
world models to predict more reliable action outcomes in open-
world scenarios where novel and unanticipated visual distractors
are inevitable. Given the current robot observation, ReOI first
detects visual distractors by identifying which elements of the
scene degrade in physically implausible ways during world model
prediction. Then, it modifies the current observation to remove
these distractors and bring the observation closer to the training
distribution. Finally, ReOI “reimagines” future outcomes with
the modified observation and reintroduces the distractors post-
hoc to preserve visual consistency for downstream planning and
verification. We validate our approach on a suite of robotic
manipulation tasks in the context of action verification, where the
verifier needs to select desired action plans based on predictions
from a world model. Our results show that ReOI is robust to both
in-distribution and out-of-distribution visual distractors. Notably,
it improves task success rates by up to 3× in the presence of
novel distractors, significantly outperforming action verification
that relies on world model predictions without imagination
interventions.
Supplementary Material: zip
Submission Number: 22
Loading