A Temporal-Difference Approach to Policy Gradient Estimation

Samuele Tosatto, Andrew Patterson, Martha White, Rupam Mahmood

2022 (modified: 27 Sept 2022)ICML 2022Readers: Everyone

Abstract: The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on th...

0 Replies