Keywords: Representation learning, self-supervised, exploration
Abstract: Learning reward-agnostic representations is an emerging paradigm in reinforcement learning. These representations can be leveraged for several purposes ranging from reward shaping to skill discovery. Nevertheless, in order to learn such representations, existing methods often rely on assuming uniform access to the state space. With such a privilege, the agent’s coverage of the environment can be limited which hurts the quality of the learned representations. In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space. Our method learns representations that constantly drive exploration while the data generated by the agent’s exploratory behavior drives the learning of better representations. We empirically validate our approach in goal-achieving tasks, demonstrating that the learned representation captures the dynamics of the environment, leads to more accurate value estimation, and to faster credit assignment, both when used for control and for reward shaping. Finally, the exploratory policy that emerges from our approach proves to be successful at continuous navigation tasks with sparse rewards.