A Temporal-Difference Approach to Policy Gradient EstimationDownload PDFOpen Website

2022 (modified: 27 Sept 2022)ICML 2022Readers: Everyone
Abstract: The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on th...
0 Replies

Loading