USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm; Yunfu Deng; Edgar Granados; Abdeslam Boularias

USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias

Published: 10 Sept 2022, Last Modified: 27 Apr 2025CoRL 2022 PosterReaders: Everyone

Keywords: Reinforcement Learning, Multi-goal reinforcement learning, Reinforcement learning theory

TL;DR: We derive a provably unbiased variant of Hindsight Experience Replay without sacrificing HER's low variance or high sample efficiency.

Abstract: Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

Student First Author: yes

Supplementary Material: zip

Code: https://github.com/schrammlb2/USHER_Implementation

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/usher-unbiased-sampling-for-hindsight/code)

10 Replies

Loading