Toggle navigation
OpenReview
.net
Login
×
Go to
ICML 2022
homepage
Optimal Estimation of Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
,
Ruiqi Zhang
,
Xiang Ji
,
Xuezhou Zhang
,
Mengdi Wang
2022 (modified: 25 Apr 2023)
ICML 2022
Readers:
Everyone
Abstract:
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventiona...
0 Replies
Loading