Published: 01 Jan 2021, Last Modified: 10 May 2023ICML 2021Readers: Everyone
Abstract:Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD lear...