Reliability-Adjusted Prioritized Experience Replay

ICLR 2026 Conference Submission17771 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Reinforcement Learning, Temporal Difference Learning, Experience Replay
TL;DR: We present Reliability-adjusted Prioritized Experience Replay, which boosts data efficiency over Prioritized Experience Replay by weighting samples with a novel TD-error reliability measure, achieving superior results on control tasks and Atari.
Abstract: Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms both uniform experience replay and PER across a diverse set of traditional environments including several classic control environments and the Atari-10 benchmark, which approximates the median score across the Atari-57 benchmark within one percent of variance.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17771
Loading