Abstract: We replicate the simulation results in Generalized Off-Policy Actor-Critic (Zhang et al., 2019), which unifies existing objectives for off-policy gradient algorithms in the continuing reinforcement learning setting by proposing the counterfactual objective. We replicate their robot simulation results with moderate success, but we find that our implementation of one of the baselines results in a performance better than that presented by the authors. We are also unable to unify the paper’s theoretical motivations and empirical results, as we show the counterfactual objective, under the described theoretical motivations and same robot simulation setting, produces subpar results.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=ryeCHNSlIS
6 Replies
Loading