Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

Qi Zhang; Yi Zhou; Shaofeng Zou

Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

Qi Zhang, Yi Zhou, Shaofeng Zou

Published: 03 Mar 2025, Last Modified: 03 Mar 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper provides the first tight convergence analyses for RMSProp and Adam for non-convex optimization under the most relaxed assumptions of coordinate-wise generalized smoothness and affine noise variance. RMSProp is firstly analyzed, which is a special case of Adam with adaptive learning rates but without first-order momentum. Specifically, to solve the challenges due to the dependence among adaptive update, unbounded gradient estimate and Lipschitz constant, we demonstrate that the first-order term in the descent lemma converges and its denominator is upper bounded by a function of gradient norm. Based on this result, we show that RMSProp with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. We then generalize our analysis to Adam, where the additional challenge is due to a mismatch between the gradient and the first-order momentum. We develop a new upper bound on the first-order term in the descent lemma, which is also a function of the gradient norm. We show that Adam with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. Our complexity results for both RMSProp and Adam match with the complexity lower bound established in Arjevani et al. (2023).

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: The new version uses citep and citet for bibliographic citations.

Assigned Action Editor: ~Stephen_Becker1

Submission Number: 3260

Loading