DPO Reproducibility Challenge Report

Jiuyang Bai; Gregory Cho; Linlin Liu; Xingchi (Miles) Yan; Liu Yang

DPO Reproducibility Challenge Report

Jiuyang Bai, Gregory Cho, Linlin Liu, Xingchi (Miles) Yan, Liu Yang

02 Dec 2019 (modified: 05 May 2023)NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone

Abstract: This project attempted to reproduce some of the experimental ﬁndings presented by Tessler et al. (2019). In the paper, the authors present a novel reinforcement learning algorithm called the Generative Actor Critic (GAC), an implementation of distributional policy optimization (DPO), for continuous control problems. The authors evaluate GAC on several MuJoCo environments and obtain competitive results when compared to state of the art policy gradient baselines. The replicated GAC algorithm was ultimately successful in reproducing the learning curves for GAC on the MuJoCo Humanoid task, using an autoregressive implicit quantity network (AIQN) and implicit quantile network (IQN) as the actor. These ﬁndings, when compared with other algorithms, support the author’s claim that GAC could be an alternative approach for continuous control. In addition to reproducing the original experiments, this report also elucidates on diagnostic investigations conducted throughout this project with the objective of presenting a better understanding of certain critical details in the algorithm. The replicated implementation of GAC used in this report can be found in the following public git repository: https://github.com/gwbcho/dpo-replication.

Track: Replicability

NeurIPS Paper Id: https://openreview.net/forum?id=H1MbSESlLB&noteId=rJeVKL1SuB

4 Replies

Loading