Keywords: Optimizers, Image Classification, Language Modeling, Generative Adversarial Networks, Reinforcement Learning
Abstract: Reproducibility Summary
Scope of Reproducibility
The proposed optimizer: AdaBelief, claims to achieve three goals: fast convergence as in adaptive methods, good
generalization as in SGD, and training stability. We perform experiments to validate the claims of the paper [28].
Methodology
To validate these claims, we reproduce experiments on Image Classification with CIFAR-10, CIFAR-100 and ImageNet
datasets, Language Modeling with Penn Treebank, Generative Modeling with WGAN, WGAN-GP and SN-GAN
architectures. We use the code provided by the author1. All experiments were performed on 8 NVIDIA V100 GPUs
and took about 1096 GPU hours in total. Our entire code is provided in the supplementary material.
Results
The image classification experiments on CIFAR-10, CIFAR-100 and ImageNet are reproduced to within 0.29%, 0.18%
and 0.25% of reported values respectively. The language modeling experiments produce an average deviation of 0.22%,
while the generative modeling experiments on WGAN, WGAN-GP and SN-GAN are replicated to within 2.2%, 1.8%
and 0.33% of reported value.
We perform ablation studies for change of dataset in language modeling and for effect of weight decay on ImageNet.
We also perform analysis of generalization ability of optimizers and of training stability of GANs. All of the results
largely support the claims made in the paper [28].
What was easy
The authors provide implementation for most of the experiments presented in the paper. Well documented code and
lucid paper helped understand the experiments clearly.
What was difficult
The challenging aspects in our study were: (1) Grid search for optimal hyperparameters (HP) in cases where HP were
not provided or results did not match, (2) time and resource intensive experiments like ImageNet ( ∼ 22 hrs.) and
SN-GAN (∼ 15 hrs.), (3) writing code to evaluate claims of the AdaBelief paper.
Communication with original authors
We communicated the original author Juntang Zhuang on multiple occasions for doubts related to hyperparameters and
code, to which he promptly replied and helped us.
Paper Url: https://openreview.net/forum?id=YeSwJDOnTRY&referrer=%5BML%20Reproducibility%20Challenge%202021%20Spring%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2021%2FSpring)
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 13 code implementations](https://www.catalyzex.com/paper/adabelief-optimizer-adapting-stepsizes-by-the/code)
4 Replies
Loading