A Study of Off-Policy Learning in Environments with Procedural Content Generation

Andy Ehrenberg; Robert Kirk; Minqi Jiang; Edward Grefenstette; Tim Rocktäschel

A Study of Off-Policy Learning in Environments with Procedural Content Generation

Andy Ehrenberg, Robert Kirk, Minqi Jiang, Edward Grefenstette, Tim Rocktäschel

Published: 23 Apr 2022, Last Modified: 05 May 2023ALOE@ICLR2022Readers: Everyone

Keywords: reinforcement learning, off-policy algorithms, procedural content generation

TL;DR: Certain popular extensions to DQN like PER do not improve performance on environments with procedural content generation

Abstract: Environments with procedural content generation (PCG environments) are useful for assessing the generalization capacity of Reinforcement Learning (RL) agents. A growing body of work focuses on generalization in RL in PCG environments, with many methods being built on top of on-policy algorithms. On the other hand, off-policy methods have received less attention. Motivated by this discrepancy, we examine how Deep Q Networks (Mnih et al., 2013) perform on the Procgen benchmark (Cobbe et al., 2020), and look at the impact of various additions to DQN on performance. We find that some popular techniques that have improved DQN on benchmarks like the Arcade Learning Environment (Bellemare et al., 2015, ALE) do not carry over to Procgen, implying that some research has overfit to tasks that lack diversity, and fails to consider the importance of generalization.

1 Reply

Loading