Keywords: Stochastic Optimization
Abstract: We study the problem of minimizing a convex objective when only noisy gradient estimates are available. Under the mild assumption that the stochastic gradients have finite $\alpha$-th moments for some $\alpha \in (1,2]$, we show that the last iterate of clipped stochastic gradient descent (Clipped-SGD) achieves high-probability convergence of order $1/K^{(2\alpha-2)/(3\alpha)}$ on smooth objectives. Finally, we provide empirical results that support and complement our theoretical analysis.
Primary Area: optimization
Submission Number: 18192
Loading