Toggle navigation
OpenReview
.net
Login
×
Go to
ICML 2022
homepage
A Temporal-Difference Approach to Policy Gradient Estimation
Samuele Tosatto
,
Andrew Patterson
,
Martha White
,
Rupam Mahmood
2022 (modified: 27 Sept 2022)
ICML 2022
Readers:
Everyone
Abstract:
The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on th...
0 Replies
Loading