Keywords: World Models, Latent Dynamics Models, Deep Reinforcement Learning
TL;DR: The paper introduces a novel stochastic gating mechanism to sparsely update the latent state in world models and a new environment to test long-term memory and exploration capabilities of RL agents.
Abstract: Latent dynamics models learn an abstract representation of an environment based on collected experience. Such models are at the core of recent advances in model-based reinforcement learning. For example, world models can imagine unseen trajectories, potentially improving data-efficiency. Planning in the real-world requires agents to understand long-term dependencies between actions and events, and account for varying degree of changes, e.g. due to a change in background or viewpoint. Moreover, in a typical scene, only a subset of objects change their state. These changes are often quite sparse which suggests incorporating such an inductive bias in a dynamics model. In this work, we introduce the variational sparse gating mechanism, which enables an agent to sparsely update a latent dynamics model's state. We also present a simplified version, which unlike prior models, has a single stochastic recurrent state. Finally, we introduce a new ShapeHerd environment, in which an agent needs to push shapes into a goal area. This environment is partially-observable and requires models to remember the previously observed objects and explore the environment to discover unseen objects. Our experiments show that the proposed methods significantly outperform leading model-based reinforcement learning methods on this environment, while also yielding competitive performance on tasks from the DeepMind Control Suite.
Supplementary Material: zip