[Re] AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Anirudh Buvanesh; Madhur Panwar

[Re] AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Anirudh Buvanesh, Madhur Panwar

Published: 11 Apr 2022, Last Modified: 06 Jul 2025RC2021Readers: Everyone

Keywords: Optimizers, Image Classification, Language Modeling, Generative Adversarial Networks, Reinforcement Learning

Abstract: Reproducibility Summary Scope of Reproducibility The proposed optimizer: AdaBelief, claims to achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. We perform experiments to validate the claims of the paper [28]. Methodology To validate these claims, we reproduce experiments on Image Classification with CIFAR-10, CIFAR-100 and ImageNet datasets, Language Modeling with Penn Treebank, Generative Modeling with WGAN, WGAN-GP and SN-GAN architectures. We use the code provided by the author1. All experiments were performed on 8 NVIDIA V100 GPUs and took about 1096 GPU hours in total. Our entire code is provided in the supplementary material. Results The image classification experiments on CIFAR-10, CIFAR-100 and ImageNet are reproduced to within 0.29%, 0.18% and 0.25% of reported values respectively. The language modeling experiments produce an average deviation of 0.22%, while the generative modeling experiments on WGAN, WGAN-GP and SN-GAN are replicated to within 2.2%, 1.8% and 0.33% of reported value. We perform ablation studies for change of dataset in language modeling and for effect of weight decay on ImageNet. We also perform analysis of generalization ability of optimizers and of training stability of GANs. All of the results largely support the claims made in the paper [28]. What was easy The authors provide implementation for most of the experiments presented in the paper. Well documented code and lucid paper helped understand the experiments clearly. What was difficult The challenging aspects in our study were: (1) Grid search for optimal hyperparameters (HP) in cases where HP were not provided or results did not match, (2) time and resource intensive experiments like ImageNet ( ∼ 22 hrs.) and SN-GAN (∼ 15 hrs.), (3) writing code to evaluate claims of the AdaBelief paper. Communication with original authors We communicated the original author Juntang Zhuang on multiple occasions for doubts related to hyperparameters and code, to which he promptly replied and helped us.

Paper Url: https://openreview.net/forum?id=YeSwJDOnTRY&referrer=%5BML%20Reproducibility%20Challenge%202021%20Spring%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2021%2FSpring)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 13 code implementations](https://www.catalyzex.com/paper/adabelief-optimizer-adapting-stepsizes-by-the/code)

4 Replies

Loading