Unlocking the Power of Representations in Long-term Novelty-based Exploration

Published: 28 Oct 2023, Last Modified: 07 Dec 2023ALOE 2023 PosterEveryoneRevisionsBibTeX
Keywords: Deep Reinforcement Learning, Exploration, Intrinsic Motivation, Representation Learning
TL;DR: We introduce a new novelty estimator for exploration in deep RL which can preserve long-term memory and be used with any representation learning techniques
Abstract: We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-HARD-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in Pitfall!
Submission Number: 9
Loading