ADOPT: Modified Adam Can Converge with the Optimal Rate with Any Hyperparameters

Shohei Taniguchi; Keno Harada; Minegishi Gouki; Yuta Oshima; Seong Cheol Jeong; Go Nagahara; Tomoshi Iiyama; Masahiro Suzuki; Yusuke Iwasawa; Yutaka Matsuo

ADOPT: Modified Adam Can Converge with the Optimal Rate with Any Hyperparameters

Shohei Taniguchi, Keno Harada, Minegishi Gouki, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: adaptive gradient method, convergence analysis, nonconvex optimization

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Adaptive gradient methods based on exponential moving averages, such as Adam and RMSprop, are widely used for deep learning. However, it is known that they do not converge unless choosing hyperparameters in a problem-dependent manner. There have been many attempts to fix their convergence (e.g., AMSGrad), but they require an impractical assumption that the stochastic gradient is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $\mathcal{O} ( 1 / \sqrt{T} )$ with any hyperparameter choice without the bounded stochastic gradient assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum calculation and the scaling operation by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves competitive or even better results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2942

Loading