Sample-Efficient Deep Reinforcement Learning via Episodic Backward UpdateDownload PDF

15 Feb 2018 (modified: 21 Apr 2024)ICLR 2018 Conference Blind SubmissionReaders: Everyone
Abstract: We propose Episodic Backward Update - a new algorithm to boost the performance of a deep reinforcement learning agent by fast reward propagation. In contrast to the conventional use of the replay memory with uniform random sampling, our agent samples a whole episode and successively propagates the value of a state into its previous states. Our computationally efficient recursive algorithm allows sparse and delayed rewards to propagate effectively throughout the sampled episode. We evaluate our algorithm on 2D MNIST Maze Environment and 49 games of the Atari 2600 Environment and show that our agent improves sample efficiency with a competitive computational cost.
TL;DR: We propose Episodic Backward Update, a novel deep reinforcement learning algorithm which samples transitions episode by episode and updates values recursively in a backward manner to achieve fast and stable learning.
Keywords: Deep Learning, Reinforcement Learning
Code: [![github](/images/github_icon.svg) suyoung-lee/Episodic-Backward-Update](https://github.com/suyoung-lee/Episodic-Backward-Update)
Data: [Arcade Learning Environment](https://paperswithcode.com/dataset/arcade-learning-environment), [MNIST](https://paperswithcode.com/dataset/mnist)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:1805.12375/code)
19 Replies

Loading