A preconditioned accelerated stochastic gradient descent algorithm

Alexandru Onose, Seyed Iman Mossavat, Henk-Jan H. Smilde

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We propose a preconditioned accelerated stochastic gradient method suitable for large scale optimization. We derive sufficient convergence conditions for the minimization of convex functions using a generic class of diagonal preconditioners and provide a formal convergence proof based on a framework originally used for on-line learning. Inspired by recent popular adaptive per-feature algorithms, we propose a specific preconditioner based on the second moment of the gradient. The sufficient convergence conditions motivate a critical adaptation of the per-feature updates in order to ensure convergence. We show empirical results for the minimization of convex and non-convex cost functions, in the context of neural network training. The method compares favorably with respect to current, first order, stochastic optimization methods.
  • Keywords: stochastic optimization, neural network, preconditioned accelerated stochastic gradient descent
  • TL;DR: We propose a preconditioned accelerated gradient method that combines Nesterov’s accelerated gradient descent with a class of diagonal preconditioners, in a stochastic setting.
0 Replies

Loading