On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Dongruo Zhou; Jinghui Chen; Yuan Cao; Ziyan Yang; Quanquan Gu

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu

Published: 16 Mar 2024, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Event Certifications: iclr.cc/ICLR/2025/Journal_Track

Abstract: Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.

Certifications: Featured Certification, J2C Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: N/A

Assigned Action Editor: ~Peter_Richtarik1

Submission Number: 1878

Loading