Assessing Generalization in Deep Reinforcement Learning

Charles Packer*; Katelyn Gao*; Jernej Kos; Philipp Krahenbuhl; Vladlen Koltun; Dawn Song

Assessing Generalization in Deep Reinforcement Learning

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krahenbuhl, Vladlen Koltun, Dawn Song

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of state-of-the-art algorithms, including those that specifically tackle the problem of generalization. The experimental results indicate that in-distribution generalization may be within the capacity of current algorithms, while out-of-distribution generalization is an exciting challenge for future work.

Keywords: reinforcement learning, generalization, benchmark

TL;DR: We provide the first benchmark and common experimental protocol for investigating generalization in RL, and conduct a systematic evaluation of state-of-the-art deep RL algorithms.

Code: [![github](/images/github_icon.svg) sunblaze-ucb/rl-generalization](https://github.com/sunblaze-ucb/rl-generalization)

Data: [Arcade Learning Environment](https://paperswithcode.com/dataset/arcade-learning-environment), [MuJoCo](https://paperswithcode.com/dataset/mujoco), [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/assessing-generalization-in-deep/code)

9 Replies

Loading

Assessing Generalization in Deep Reinforcement Learning

Charles Packer*, Katelyn Gao*, Jernej Kos, Philipp Krahenbuhl, Vladlen Koltun, Dawn Song

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krahenbuhl, Vladlen Koltun, Dawn Song