Optimal Estimation of Policy Gradient via Double Fitted IterationDownload PDFOpen Website

2022 (modified: 25 Apr 2023)ICML 2022Readers: Everyone
Abstract: Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventiona...
0 Replies

Loading