Curriculum-guided Hindsight Experience ReplayDownload PDF

Meng Fang, Tianyi Zhou, Yali Du, Lei Han, Zhengyou Zhang

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: In off-policy deep reinforcement learning, it is usually hard to collect sufficient successful experiences with positive reward to learn from. Hindsight experience replay (HER) enables an agent to also learn from failures by treating the achieved state of a failed experience as a pseudo goal. However, not all the failed experiences are equally useful in different learning stages, and it is not efficient to replay all of them or subsample them uniformly in HER. In this paper, we propose to 1) adaptively select the failed experiences for replay according to its proximity to the true goal and the curiosity of exploration over diverse pseudo goals, and 2) smoothly vary the proportion of the proximity and the curiosity/diversity from earlier to later learning episodes. We use a strategy imitating human learning that enforces more curiosity in earlier stages changes to more proximity later. This ``Goal-and-Curiosity-driven Curriculum (GCC) Learning'' leads to ``Curriculum-guided HER (CHER)'', which adaptively and dynamically controls the exploration-exploitation trade-off during the learning process. In experiments of manipulation tasks on robots, we show that CHER is significantly more efficient than HER in practice.
Code Link: https://github.com/mengf1/CHER
CMT Num: 6872
0 Replies

Loading