Return Dispersion as an Estimator of Learning Potential for Prioritized Level ReplayDownload PDF

12 Oct 2021, 19:37 (modified: 17 Nov 2021, 19:06)Deep RL Workshop NeurIPS 2021Readers: Everyone
Keywords: procedurally generated environments, curriculum learning, meta-learning, Procgen benchmark
TL;DR: Dispersion of returns can be used as an alternative to TD errors to score procedurally generated levels for future learning potential
Abstract: Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference (TD) errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.
0 Replies