Return Dispersion as an Estimator of Learning Potential for Prioritized Level ReplayDownload PDF

Published: 18 Oct 2021, Last Modified: 05 May 2023ICBINB@NeurIPS2021 PosterReaders: Everyone
Keywords: reinforcement learning, procedurally generated environments, curriculum learning, Procgen benchmark
TL;DR: Dispersion of returns can be used as an alternative to TD errors to score procedurally generated levels for future learning potential
Abstract: Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.
Category: Negative result: I would like to share my insights and negative results on this topic with the community, Stuck paper: I hope to get ideas in this workshop that help me unstuck and improve this paper
1 Reply

Loading