On the expected running time of nonconvex optimization with early stopping

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Keywords: non-convex, stopping times, statistics, gradient descent, early stopping
  • TL;DR: How to bound the expected number of iterations before gradient descent finds a stationary point
  • Abstract: This work examines the convergence of stochastic gradient algorithms that use early stopping based on a validation function, wherein optimization ends when the magnitude of a validation function gradient drops below a threshold. We derive conditions that guarantee this stopping rule is well-defined and analyze the expected number of iterations and gradient evaluations needed to meet this criteria. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach for stochastic gradient descent (SGD), allowing for biased update directions subject to a Lyapunov condition. We apply the approach to obtain new bounds on the expected running time of several algorithms, including Decentralized SGD (DSGD), a variant of decentralized SGD, known as \textit{Stacked SGD}, and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.
0 Replies

Loading