Abstract: In Model-Based Reinforcement Learning (MBRL), an agent learns to make decisions by building a world model that predicts the environment's dynamics. The accuracy of this world model is crucial for generalizability and sample efficiency. Often, world models focus on irrelevant, exogenous features over minor but key information. We notice that important task-related information is often associated with dynamic objects. To encourage the world model to focus on such information, in this work, we propose an augmentation to the world model training using a temporal prediction loss in the embedding space as an auxiliary loss. Building our method on the DreamerV3 architecture, we improve sample efficiency and stability by learning better representations for world model and policy training. We evaluate our method on the Atari100k and Distracting Control Suite benchmarks, demonstrating significant improvements in world model quality and overall MBRL performance.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Aleksandra_Faust1
Submission Number: 5002
Loading