Keywords: Exploration, Deep Reinforcement Learning, Representation Learning, Self-Supervised Learning
TL;DR: A robust exploration method based on future open-loop predictions at the latent level via diffusion models.
Abstract: We present Bootstrap your own Latents with Diffusion models for Exploration (BLaDE), a general approach for curiosity-driven exploration in complex, partially-observable and stochastic environments. BLaDE is a natural extension of Bootstrap Your Own Latents for Exploration (BYOL-Explore) which is a multi-step prediction-error method at the latent level that learns a world representation, the world dynamics, and provides an intrinsic-reward all-together by optimizing a single prediction loss with no additional auxiliary objective. Contrary to BYOL-Explore that predicts future latents from past latents and future open-loop actions, BLaDE predicts, via a diffusion model, future latents from past observations, future open-loop actions and a noisy version of future latents. Leaking information about future latents allows to control the variance of the distribution of future latents which makes the method agnostic to stochastic traps. Our experiments on different noisy versions of Montezuma’s Revenge show that BLaDE handles stochasticity better than Random Network Distillation, Intrinsic Curiosity Module and BYOL-Explore without degrading the performance of BYOL-Explore in the non-noisy and fairly deterministic Montezuma’s Revenge.
Supplementary Material: zip