Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Joshua Achiam; Shankar Sastry

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Joshua Achiam, Shankar Sastry

22 May 2025 (modified: 23 Mar 2025)ICLR 2017 Invite to WorkshopReaders: Everyone

Abstract: Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $\epsilon$-greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the other gives the $k$-step learning progress. We show that our incentives enable agents to succeed in a wide range of environments with high-dimensional state spaces and very sparse rewards, including continuous control tasks and games in the Atari RAM domain, outperforming several other heuristic exploration techniques.

TL;DR: Learn a dynamics model and use it to make your agent boldly go where it has not gone before.

Conflicts: berkeley.edu

Keywords: Reinforcement Learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/surprise-based-intrinsic-motivation-for-deep/code)

10 Replies

Loading