Envision the Future in Open-World Dynamic Tasks by a Hierarchical World Model with Residual Enhanced Foresight

Envision the Future in Open-World Dynamic Tasks by a Hierarchical World Model with Residual Enhanced Foresight

ICLR 2026 Conference Submission20195 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hierarchical world model, model based reinforcement learning, visually grounded task planning representation

TL;DR: We present ResDreamer, a hierarchical world model with residually connected visual planning representations.

Abstract: Interacting with dynamic objects and even opponent agents in an open world remains a challenge for reinforcement learning. Task planning representations are crucial in such scenarios. Existing reasoning representations grounded in language or vision have demonstrated efficacy, yet most require pretraining and fine-tuning on domain-specific knowledge datasets. We argue that a reasoning representation purely learned from self-supervised environmental interactions, integrated with brain-like hierarchical structure, offers substantial value for open-world dynamic tasks. In this paper, we present ResDreamer, a hierarchical world model with residually connected visual planning representations. In ResDreamer, high-level world model observes lower level reconstruction residuals from lower layers, aiming to capture more advanced world dynamics and form a more comprehensive internal representation of the world. Each layer of the world model employs augmented environmental observations, which include visual foresight reconstructed from imagined trajectories. These augmented observations are further calibrated by residuals predicted by the higher-level world model. Our approach demonstrates higher sampling efficiency, parameter efficiency, and scalability compared to state-of-the-art methods.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 20195

Loading