Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We propose Episodic Backward Update - a new algorithm to boost the performance of a deep reinforcement learning agent by fast reward propagation. In contrast to the conventional use of the replay memory with uniform random sampling, our agent samples a whole episode and successively propagates the value of a state into its previous states. Our computationally efficient recursive algorithm allows sparse and delayed rewards to propagate effectively throughout the sampled episode. We evaluate our algorithm on 2D MNIST Maze Environment and 49 games of the Atari 2600 Environment and show that our agent improves sample efficiency with a competitive computational cost.
  • TL;DR: We propose Episodic Backward Update, a novel deep reinforcement learning algorithm which samples transitions episode by episode and updates values recursively in a backward manner to achieve fast and stable learning.
  • Keywords: Deep Learning, Reinforcement Learning