2018 (modified: 09 Nov 2022)COLT2018Readers: Everyone
Abstract:Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most wi...