Abstract: This project attempted to reproduce some of the experimental findings presented by Tessler et al. (2019). In the paper, the authors present a novel reinforcement learning algorithm called the Generative Actor Critic (GAC), an implementation of distributional policy optimization (DPO), for continuous control problems. The authors evaluate GAC on several MuJoCo environments and obtain competitive results when compared to state of the art policy gradient baselines. The replicated GAC algorithm was ultimately successful in reproducing the learning curves for GAC on the MuJoCo Humanoid task, using an autoregressive implicit quantity network (AIQN) and implicit quantile network (IQN) as the actor. These findings, when compared with other algorithms, support the author’s claim that GAC could be an alternative approach for continuous control. In addition to reproducing the original experiments, this report also elucidates on diagnostic investigations conducted throughout this project with the objective of presenting a better understanding of certain critical details in the algorithm. The replicated implementation of GAC used in this report can be found in the following public git repository: https://github.com/gwbcho/dpo-replication.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=H1MbSESlLB¬eId=rJeVKL1SuB
4 Replies
Loading