Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Pierluca D'Oro; Max Schwarzer; Evgenii Nikishin; Pierre-Luc Bacon; Marc G Bellemare; Aaron Courville

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville

08 Oct 2022 (modified: 22 Jun 2025)Deep RL Workshop 2022Readers: Everyone

Keywords: reinforcement learning, sample efficiency, resets

TL;DR: The combination of a large number of updates and resets drastically improves the sample efficiency of deep RL algorithms.

Abstract: Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improving the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/sample-efficient-reinforcement-learning-by/code)

0 Replies

Loading