Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Ofir Gaash; Kfir Yehuda Levy; Yair Carmon

Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Ofir Gaash, Kfir Yehuda Levy, Yair Carmon

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: convex optimization, stochastic optimization, smooth optimization, generalized smoothness

TL;DR: We show that clipped SGD converges with high probability on convex $(L_0,L_1)$-smooth functions.

Abstract: We study stochastic gradient descent (SGD) with gradient clipping on convex functions under a generalized smoothness assumption called $(L_0,L_1)$-smoothness. Using gradient clipping, we establish a high probability convergence rate that matches the SGD rate in the $L$ smooth case up to polylogarithmic factors and additive terms. We also propose a variation of adaptive SGD with gradient clipping, which achieves the same guarantee. We perform empirical experiments to examine our theory and algorithmic choices.

Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)

Submission Number: 10891

Loading