Abstract: We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access the underlying state of the game, and occasionally completes the first level. This suggests that relatively simple methods that scale well can be sufficient to tackle challenging exploration problems.
Keywords: reinforcement learning, exploration, curiosity
TL;DR: A simple exploration bonus is introduced and achieves state of the art performance in 3 hard exploration Atari games.
Code: [![github](/images/github_icon.svg) openai/random-network-distillation](https://github.com/openai/random-network-distillation) + [![Papers with Code](/images/pwc_icon.svg) 20 community implementations](https://paperswithcode.com/paper/?openreview=H1lJJnR5Ym)
Data: [Arcade Learning Environment](https://paperswithcode.com/dataset/arcade-learning-environment), [URLB](https://paperswithcode.com/dataset/urlb)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 10 code implementations](https://www.catalyzex.com/paper/exploration-by-random-network-distillation/code)
21 Replies
Loading