Temporal Difference Weighted Ensemble For Reinforcement LearningDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: reinforcement learning, ensemble, deep q-network
TL;DR: Ensemble method for reinforcement learning that weights Q-functions based on accumulated TD errors.
Abstract: Combining multiple function approximators in machine learning models typically leads to better performance and robustness compared with a single function. In reinforcement learning, ensemble algorithms such as an averaging method and a majority voting method are not always optimal, because each function can learn fundamentally different optimal trajectories from exploration. In this paper, we propose a Temporal Difference Weighted (TDW) algorithm, an ensemble method that adjusts weights of each contribution based on accumulated temporal difference errors. The advantage of this algorithm is that it improves ensemble performance by reducing weights of Q-functions unfamiliar with current trajectories. We provide experimental results for Gridworld tasks and Atari tasks that show significant performance improvements compared with baseline algorithms.
Code: https://drive.google.com/open?id=1-EfJUguTqvWt32Zb2-AmyMLgrjLn5Va3
Original Pdf: pdf
10 Replies

Loading