From Importance Sampling to Doubly Robust Policy Gradient

Jiawei Huang, Nan Jiang

2020 (modified: 19 Apr 2023)ICML 2020Readers: Everyone

Abstract: We show that on-policy policy gradient (PG) and its variance reduction variants can be derived by taking finite-difference of function evaluations supplied by estimators from the importance samplin...

0 Replies