Keywords: Stochastic gradient descent, nonconvex optimization, nonsmooth optimization, random-reshuffling stochastic gradient descent
TL;DR: This paper shows that the $\epsilon$-stationary point exists in the final iterates of SGDs in minimizing nonconvex objectives, not just anywhere in the entire range of iterates---A much stronger result than the existing one.
Abstract: Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching over the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that the $\epsilon$-stationary point exists in the final iterates of SGDs, not just anywhere in the entire range of iterates---A much stronger result than the existing one. Additionally, our analyses allow us to measure the \emph{density of the $\epsilon$-stationary points} in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T}})$ asymptotic rate under various existing assumptions on the regularity of the objective function and the bounds on the stochastic gradient.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)
Supplementary Material: zip
5 Replies
Loading