Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Jincheng Mei; Yangchen Pan; Martha White; Amir-massoud Farahmand; Hengshuai Yao

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Jincheng Mei, Yangchen Pan, Martha White, Amir-massoud Farahmand, Hengshuai Yao

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Experience replay, prioritized sampling, model-based reinforcement learning, Dyna architecture

Abstract: The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This theoretical equivalence highlights two limitations of current prioritized experience replay methods: insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the hypothetical experiences generated from these states are sampled proportionally to approximately true priorities. We also characterize the distance between the sampling distribution of our method and the true prioritized sampling distribution. Our experiments on both benchmark and application-oriented domains show that our approach achieves superior performance over baselines.

One-sentence Summary: We theoretically understand why prioritized experience replay can help and point out its limitations and propose new algorithms to address these limitations.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/beyond-prioritized-replay-sampling-states-in/code)

Reviewed Version (pdf): /references/pdf?id=YkPN4ts98o

9 Replies

Loading