Generalized Off-Policy Actor CriticDownload PDF

02 Dec 2019 (modified: 05 May 2023)NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone
Abstract: We replicate the simulation results in Generalized Off-Policy Actor-Critic (Zhang et al., 2019), which unifies existing objectives for off-policy gradient algorithms in the continuing reinforcement learning setting by proposing the counterfactual objective. We replicate their robot simulation results with moderate success, but we find that our implementation of one of the baselines results in a performance better than that presented by the authors. We are also unable to unify the paper’s theoretical motivations and empirical results, as we show the counterfactual objective, under the described theoretical motivations and same robot simulation setting, produces subpar results.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=ryeCHNSlIS
6 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview