- Abstract: We propose a preconditioned accelerated stochastic gradient method suitable for large scale optimization. We derive sufficient convergence conditions for the minimization of convex functions using a generic class of diagonal preconditioners and provide a formal convergence proof based on a framework originally used for on-line learning. Inspired by recent popular adaptive per-feature algorithms, we propose a specific preconditioner based on the second moment of the gradient. The sufficient convergence conditions motivate a critical adaptation of the per-feature updates in order to ensure convergence. We show empirical results for the minimization of convex and non-convex cost functions, in the context of neural network training. The method compares favorably with respect to current, first order, stochastic optimization methods.
- Keywords: stochastic optimization, neural network, preconditioned accelerated stochastic gradient descent
- TL;DR: We propose a preconditioned accelerated gradient method that combines Nesterov’s accelerated gradient descent with a class of diagonal preconditioners, in a stochastic setting.