Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Published: 2022, Last Modified: 12 May 2023ICML 2022Readers: Everyone

Abstract: In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to re...

0 Replies