Published: 01 Jan 2022, Last Modified: 12 May 2023ICML 2022Readers: Everyone
Abstract:In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to re...