Keywords: reinforcement learning, ensemble, deep q-network
TL;DR: Ensemble method for reinforcement learning that weights Q-functions based on accumulated TD errors.
Abstract: Combining multiple function approximators in machine learning models typically leads to better performance and robustness compared with a single function. In reinforcement learning, ensemble algorithms such as an averaging method and a majority voting method are not always optimal, because each function can learn fundamentally different optimal trajectories from exploration. In this paper, we propose a Temporal Difference Weighted (TDW) algorithm, an ensemble method that adjusts weights of each contribution based on accumulated temporal difference errors. The advantage of this algorithm is that it improves ensemble performance by reducing weights of Q-functions unfamiliar with current trajectories. We provide experimental results for Gridworld tasks and Atari tasks that show significant performance improvements compared with baseline algorithms.
Code: https://drive.google.com/open?id=1-EfJUguTqvWt32Zb2-AmyMLgrjLn5Va3
Original Pdf: pdf
10 Replies
Loading