BLAST: Latent Dynamics Models from BootstrappingDownload PDF

12 Oct 2021, 19:37 (modified: 02 Dec 2021, 03:59)Deep RL Workshop NeurIPS 2021Readers: Everyone
Keywords: representation learning, model-based reinforcement learning, reinforcement learning
TL;DR: BLAST is a latent dynamics model that learns without reconstruction losses by predicting its own latent representations.
Abstract: State-of-the-art world models such as DreamerV2 have significantly improved the capabilities of model-based reinforcement learning. However, these approaches typically rely on a reconstruction loss to shape their latent representations, which is known to fail in environments with high fidelity visual observations. Previous work has found that when learning latent dynamics models without a reconstruction loss by using only the signal provided by the reward, the performance can also drop dramatically. We present a simple set of modification to DreamerV2 to remove its reliance on reconstruction inspired by the recent self-supervised learning method Bootstrap Your Own Latent. The combination of adding a stop-gradient to the posterior, using a powerful auto-regressive model for the prior, and using a slowly updating target encoder, which we call BLAST, allows the world model to learn from signals present in both the reward and observations, improving efficiency on our tested environment as well as being significantly more robust to visual distractors.
Supplementary Material: zip
0 Replies