2021 (modified: 13 Oct 2022)ICML 2021Readers: Everyone
Abstract:Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable...