Uncertainty Prioritized Experience Replay

Rodrigo Antonio Carrasco-Davis; Sebastian Lee; Claudia Clopath; Will Dabney

Uncertainty Prioritized Experience Replay

Rodrigo Antonio Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney

Published: 09 May 2025, Last Modified: 05 Sept 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prioritized Replay, Uncertainty, Distributional RL, Ensembles

Abstract: Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions are prioritized based on their temporal difference error. However, this approach is prone to favoring noisy transitions, even when the value estimation closely approximates the target mean. This phenomenon resembles the *noisy TV* problem postulated in the exploration literature, in which exploration-guided agents get stuck by mistaking noise for novelty. To mitigate the disruptive effects of noise in value estimation, we propose using epistemic uncertainty estimation to guide the prioritization of transitions from the replay buffer. Epistemic uncertainty quantifies the uncertainty that can be reduced by learning, hence reducing transitions sampled from the buffer generated by unpredictable random processes. We first illustrate the benefits of epistemic uncertainty prioritized replay in two tabular toy models: a simple multi-arm bandit task, and a noisy gridworld. Subsequently, we evaluate our prioritization scheme on the Atari suite, outperforming quantile regression deep Q-learning benchmarks; thus forging a path for the use of uncertainty prioritized replay in reinforcement learning agents.

Submission Number: 45

Loading