Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated PrioritiesDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Experience replay, prioritized sampling, model-based reinforcement learning, Dyna architecture
Abstract: The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This theoretical equivalence highlights two limitations of current prioritized experience replay methods: insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the hypothetical experiences generated from these states are sampled proportionally to approximately true priorities. We also characterize the distance between the sampling distribution of our method and the true prioritized sampling distribution. Our experiments on both benchmark and application-oriented domains show that our approach achieves superior performance over baselines.
One-sentence Summary: We theoretically understand why prioritized experience replay can help and point out its limitations and propose new algorithms to address these limitations.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2007.09569/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=YkPN4ts98o
9 Replies

Loading