Adaptive Exponential Decay Rates for Adam

ICLR 2025 Conference Submission1228 Authors

17 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Optimization method, deep neural networks, Adam and its variants
Abstract: Adam and its variants, including AdaBound, AdamW, and AdaBelief, have gained widespread popularity for enhancing the learning speed and generalization performance of deep neural networks. This optimization technique adjusts weight vectors by utilizing predetermined exponential decay rates (i.e.,$\beta_1$ = 0.9, $\beta_2$ = 0.999) based on the first moment estimate and the second raw moment estimate of the gradient. However, the default exponential decay rates might not be optimal, and the process of tuning them through trial and error with experience proves to be time-consuming. In this paper, we introduce AdamE, a novel variant of Adam designed to automatically leverage dynamic exponential decay rates on the first moment estimate and the second raw moment estimate of the gradient. Additionally, we provide theoretical proof of the convergence of AdamE in both convex and non-convex cases. To validate our claims, we perform experiments across various neural network architectures and tasks. Comparative analyses with adaptive methods utilizing default exponential decay rates reveal that AdamE consistently achieves rapid convergence and high accuracy in language modeling, node classification, and graph clustering tasks.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: pdf
Submission Number: 1228
Loading