Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: convex optimization, stochastic optimization, smooth optimization, generalized smoothness
TL;DR: We show that clipped SGD converges with high probability on convex $(L_0,L_1)$-smooth functions.
Abstract: We study stochastic gradient descent (SGD) with gradient clipping on convex functions under a generalized smoothness assumption called $(L_0,L_1)$-smoothness. Using gradient clipping, we establish a high probability convergence rate that matches the SGD rate in the $L$ smooth case up to polylogarithmic factors and additive terms. We also propose a variation of adaptive SGD with gradient clipping, which achieves the same guarantee. We perform empirical experiments to examine our theory and algorithmic choices.
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 10891
Loading