Generalized Off-Policy Actor Critic

Jens Rischbieth; Vafa Behnam; Sean Hastings

Generalized Off-Policy Actor Critic

Jens Rischbieth, Vafa Behnam, Sean Hastings

02 Dec 2019 (modified: 05 May 2023)NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone

Abstract: We replicate the simulation results in Generalized Off-Policy Actor-Critic (Zhang et al., 2019), which unifies existing objectives for off-policy gradient algorithms in the continuing reinforcement learning setting by proposing the counterfactual objective. We replicate their robot simulation results with moderate success, but we find that our implementation of one of the baselines results in a performance better than that presented by the authors. We are also unable to unify the paper’s theoretical motivations and empirical results, as we show the counterfactual objective, under the described theoretical motivations and same robot simulation setting, produces subpar results.

Track: Replicability

NeurIPS Paper Id: https://openreview.net/forum?id=ryeCHNSlIS

6 Replies

Loading