- Abstract: Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of state-of-the-art algorithms, including those that specifically tackle the problem of generalization. The experimental results indicate that in-distribution generalization may be within the capacity of current algorithms, while out-of-distribution generalization is an exciting challenge for future work.
- Keywords: reinforcement learning, generalization, benchmark
- TL;DR: We provide the first benchmark and common experimental protocol for investigating generalization in RL, and conduct a systematic evaluation of state-of-the-art deep RL algorithms.