- Keywords: reinforcement learning, exploration, curiosity
- TL;DR: Instead of rewarding agents for predicting the next state, reward them for taking actions that lead to changes in the state.
- Abstract: Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning (RL). Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage the agent to explore the environment. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to ever visit the same state more than once. We propose a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent's learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks used in prior curiosity-driven exploration work. Our experiments demonstrate that our approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control.