High Probability Bounds for Non-Convex Stochastic Optimization with Momentum

Shaojie Li; Pengwei Tang; Bowei Zhu; Yong Liu

High Probability Bounds for Non-Convex Stochastic Optimization with Momentum

Shaojie Li, Pengwei Tang, Bowei Zhu, Yong Liu

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Momentum, nonconvex learning, generalization

Abstract: Stochastic gradient descent with momentum (SGDM) is widely used in machine learning, yet high-probability learning bounds for SGDM in non-convex settings remain scarce. In this paper, we provide high-probability convergence bounds and generalization bounds for SGDM. First, we establish such bounds for the gradient norm in the general non-convex case. The resulting convergence bounds are tighter than existing theoretical results, and the obtained generalization bounds seem to be the first for SGDM. Next, under the Polyak-{\L}ojasiewicz condition, we derive bounds for the function-value error instead of the gradient norm, and the corresponding learning rates are faster than in the general non-convex case. Finally, by additionally assuming a mild Bernstein condition on the gradient, we obtain even sharper generalization bounds whose learning rates can reach $\widetilde{\mathcal{O}}(1/n^2)$ in the low-noise regime, where $n$ is the sample size. Overall, we provide a systematic study of high-probability learning bounds for non-convex SGDM.

Primary Area: learning theory

Submission Number: 25501

Loading