High-Probability Bounds for the Last Iterate of Clipped SGD

High-Probability Bounds for the Last Iterate of Clipped SGD

ICLR 2026 Conference Submission18192 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic Optimization

Abstract: We study the problem of minimizing a convex objective when only noisy gradient estimates are available. Under the mild assumption that the stochastic gradients have finite $\alpha$-th moments for some $\alpha \in (1,2]$, we show that the last iterate of clipped stochastic gradient descent (Clipped-SGD) achieves high-probability convergence of order $1/K^{(2\alpha-2)/(3\alpha)}$ on smooth objectives. Finally, we provide empirical results that support and complement our theoretical analysis.

Primary Area: optimization

Submission Number: 18192

Loading