Temporal Difference Weighted Ensemble For Reinforcement Learning

Takuma Seno; Michita Imai

Temporal Difference Weighted Ensemble For Reinforcement Learning

Takuma Seno, Michita Imai

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, ensemble, deep q-network

TL;DR: Ensemble method for reinforcement learning that weights Q-functions based on accumulated TD errors.

Abstract: Combining multiple function approximators in machine learning models typically leads to better performance and robustness compared with a single function. In reinforcement learning, ensemble algorithms such as an averaging method and a majority voting method are not always optimal, because each function can learn fundamentally different optimal trajectories from exploration. In this paper, we propose a Temporal Difference Weighted (TDW) algorithm, an ensemble method that adjusts weights of each contribution based on accumulated temporal difference errors. The advantage of this algorithm is that it improves ensemble performance by reducing weights of Q-functions unfamiliar with current trajectories. We provide experimental results for Gridworld tasks and Atari tasks that show significant performance improvements compared with baseline algorithms.

Code: https://drive.google.com/open?id=1-EfJUguTqvWt32Zb2-AmyMLgrjLn5Va3

Original Pdf: pdf

10 Replies

Loading