Experience Replay with Likelihood-free Importance Weights

Samarth Sinha; Jiaming Song; Animesh Garg; Stefano Ermon

Experience Replay with Likelihood-free Importance Weights

Samarth Sinha, Jiaming Song, Animesh Garg, Stefano Ermon

28 Sept 2020 (modified: 12 Oct 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Experience Replay, Off-Policy Optimization, Deep Reinforcement Learning

Abstract: The use of past experiences to accelerate temporal difference (TD) learning of value functions, or experience replay, is a key component in deep reinforcement learning. In this work, we propose to reweight experiences based on their likelihood under the stationary distribution of the current policy, and justify this with a contraction argument over the Bellman evaluation operator. The resulting TD objective encourages small approximation errors on the value function over frequently encountered states. To balance bias and variance in practice, we use a likelihood-free density ratio estimator between on-policy and off-policy experiences, and use the ratios as the prioritization weights. We apply the proposed approach empirically on three competitive methods, Soft Actor Critic (SAC), Twin Delayed Deep Deterministic policy gradient (TD3) and Data-regularized Q (DrQ), over 11 tasks from OpenAI gym and DeepMind control suite. We achieve superior sample complexity on 35 out of 45 method-task combinations compared to the best baseline and similar sample complexity on the remaining 10.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: A simple approach that improves deep actor-critic methods (SAC, TD3, DrQ) by appropriately reweighting the experience replay buffer

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/experience-replay-with-likelihood-free/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=EFNlLUfIY4

16 Replies

Loading