Provable Benefit of Adaptivity in Adam

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Adam, convergence, non-uniform smoothness
TL;DR: We show that Adam converges under non-uniform smoothness.
Abstract: The Adaptive Moment Estimation (Adam) algorithm is widely adopted in practical applications due to its fast convergence. However, its theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely on the bounded smoothness assumption, referred to as the \emph{L-smooth condition}. Unfortunately, this assumption does not hold for many deep learning tasks. Moreover, we believe that this assumption obscures the true benefit of Adam, as the algorithm can adapt its update magnitude according to local smoothness. This important feature of Adam becomes irrelevant when assuming globally bounded smoothness. In this paper, we present the first convergence analysis of Adam without the bounded smoothness assumption. We demonstrate that Adam can maintain its convergence properties when smoothness is linearly bounded by the gradient norm, referred to as the \emph{$(L_0, L_1)$-smooth condition}. Further, under the same setting, we refine the existing lower bound of SGD and show that SGD can be arbitrarily slower than Adam. To our knowledge, this is the first time that Adam and SGD are rigorously compared in the same setting where the advantage of Adam can be revealed. Our theoretical results shed new light on the advantage of Adam over SGD.
Supplementary Material: pdf
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6730
Loading