A preconditioned accelerated stochastic gradient descent algorithm

Alexandru Onose; Seyed Iman Mossavat; Henk-Jan H. Smilde

A preconditioned accelerated stochastic gradient descent algorithm

Alexandru Onose, Seyed Iman Mossavat, Henk-Jan H. Smilde

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We propose a preconditioned accelerated stochastic gradient method suitable for large scale optimization. We derive sufficient convergence conditions for the minimization of convex functions using a generic class of diagonal preconditioners and provide a formal convergence proof based on a framework originally used for on-line learning. Inspired by recent popular adaptive per-feature algorithms, we propose a specific preconditioner based on the second moment of the gradient. The sufficient convergence conditions motivate a critical adaptation of the per-feature updates in order to ensure convergence. We show empirical results for the minimization of convex and non-convex cost functions, in the context of neural network training. The method compares favorably with respect to current, first order, stochastic optimization methods.

Keywords: stochastic optimization, neural network, preconditioned accelerated stochastic gradient descent

TL;DR: We propose a preconditioned accelerated gradient method that combines Nesterov’s accelerated gradient descent with a class of diagonal preconditioners, in a stochastic setting.

Data: [MNIST](https://paperswithcode.com/dataset/mnist)

4 Replies

Loading