Convergence Analysis of Nesterov's Accelerated Gradient Descent under Relaxed Assumptions

Convergence Analysis of Nesterov's Accelerated Gradient Descent under Relaxed Assumptions

ICLR 2026 Conference Submission17618 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Nesterov's Accelerated Gradient, Convergence Theory, Convex Optimization, Stochastic Optimization, Affine Variance

Abstract: We study convergence rates of Nesterov's Accelerated Gradient Descent (NAG) method for convex optimization in both deterministic and stochastic settings. We focus on a more general smoothness condition raised from several machine learning problems empirically and theoretically. We show the accelerated convergence rate of order $\mathcal{O}\left(1/T^2\right)$ in terms of the function value gap, given access to exact gradients of objective functions, matching the optimal rate for standard smooth convex optimization in \citep{nesterov1983method}. Under the relaxed affine-variance noise assumption for stochastic optimization, we establish the high-probability convergence rate of order $\tilde{\mathcal{O}}\left(\sqrt{\log\left(1/\delta\right)/T}\right)$ and this rate could improve to $\tilde{\mathcal{O}}\left(\log\left(1/\delta\right)/T^2\right)$ when the noise parameters are sufficiently small. Here, $T$ denotes the total number of iterations and $\delta$ is the probability margin. Up to logarithm factors, our probabilistic convergence rate reaches the same order of the expected rate obtained in \citep{ghadimi2016accelerated} where the assumptions of bounded variance noise and Lipschitz smoothness are required.

Primary Area: optimization

Submission Number: 17618

Loading