Envision the Future in Open-World Dynamic Tasks by a Hierarchical World Model with Residual Enhanced Foresight

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: hierarchical world model, model based reinforcement learning, visually grounded task planning representation
TL;DR: We present ResDreamer, a hierarchical world model with residually connected visual planning representations.
Abstract: Interacting with proactive agents in open-world environments remains a core challenge for reinforcement learning, as other participants exhibit reciprocal and even adversarial behavior. Effective reasoning representations are crucial in such settings. Although language- or vision-grounded approaches have shown promise, most depend on large-scale pre-training and extensive domain-specific fine-tuning. Drawing inspiration from the neuroscience phenomenon of active gaze control—where humans proactively direct gaze toward the predicted future location of a dynamic object (e.g., a cricket ball, blade, or oncoming vehicle) well before distinguishing visual features appear—we propose ResDreamer, a brain-inspired hierarchical world model that employs residually connected visual planning representations. In ResDreamer, each higher-level layer learns on the reconstruction residuals of the layer below, enabling progressive capture of increasingly advanced world dynamics and the construction of a richer internal representation. Every layer incorporates augmented observations that include foresight images that are further modulated by top-down residual prediction signals. This mechanism yields highly informative, predictive, and knowledge-driven visual reasoning representations without external supervision. Empirical results demonstrate that ResDreamer achieves higher sample efficiency, parameter efficiency, and scalability compared to state-of-the-art baselines, paving the way for more adaptive agents in open-ended, dynamic, and interactive environments.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 20195
Loading