Envision the Future in Open-World Dynamic Tasks by a Hierarchical World Model with Residual Enhanced Foresight
Keywords: hierarchical world model, model based reinforcement learning, visually grounded task planning representation
TL;DR: We present ResDreamer, a hierarchical world model with residually connected visual planning representations.
Abstract: Interacting with dynamic objects and even opponent agents in an open world remains a challenge for reinforcement learning.
Task planning representations are crucial in such scenarios. Existing reasoning representations grounded in language or vision have demonstrated efficacy, yet most require pretraining and fine-tuning on domain-specific knowledge datasets.
We argue that a reasoning representation purely learned from self-supervised environmental interactions, integrated with brain-like hierarchical structure, offers substantial value for open-world dynamic tasks.
In this paper, we present ResDreamer, a hierarchical world model with residually connected visual planning representations.
In ResDreamer, high-level world model observes lower level reconstruction residuals from lower layers, aiming to capture more advanced world dynamics and form a more comprehensive internal representation of the world.
Each layer of the world model employs augmented environmental observations, which include visual foresight reconstructed from imagined trajectories. These augmented observations are further calibrated by residuals predicted by the higher-level world model.
Our approach demonstrates higher sampling efficiency, parameter efficiency, and scalability compared to state-of-the-art methods.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 20195
Loading