Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

Iryna Korshunova; Minqi Jiang; Jack Parker-Holder; Tim Rocktäschel; Edward Grefenstette

Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: procedurally generated environments, curriculum learning, meta-learning, Procgen benchmark

TL;DR: Dispersion of returns can be used as an alternative to TD errors to score procedurally generated levels for future learning potential

Abstract: Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference (TD) errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

0 Replies

Loading