Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Charles Packer; Pieter Abbeel; Joseph E. Gonzalez

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Meta Reinforcement Learning, Meta-RL, Deep RL, Hindsight, Sparse Reward

TL;DR: Meta-RL algorithms struggle with sparse reward environments and often require a dense reward function for training - we fix this by applying ideas from Hindsight Experience Replay to meta-RL.

Abstract: Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments. Although existing meta-RL algorithms can learn strategies for adapting to new sparse reward tasks, the actual adaptation strategies are learned using hand-shaped reward functions, or require simple environments where random exploration is sufficient to encounter sparse reward. In this paper we present a formulation of hindsight relabelling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward. We demonstrate the effectiveness of our approach on a suite of challenging sparse reward environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

11 Replies

Loading