Exact Stochastic Newton Method for Deep Learning: the feedforward networks case.Download PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Deep Learning, Second-order Optimization, Newton Method, Sifrian, Hessian, Exact Stochastic Newton, Saddle-Free Newton, Non-Convex Optimization
Abstract: The inclusion of second-order information into Deep Learning optimization has drawn consistent interest as a way forward to improve upon gradient descent methods. Estimating the second-order update is often convoluted and computationally expensive, which drastically limits its usage scope and forces the use of various truncations and approximations. This work demonstrates that it is possible to solve the Newton direction in the stochastic case exactly. We consider feedforward networks as a base model, build a second-order Lagrangian which we call Sifrian, and provide a closed-form formula for the exact stochastic Newton direction under some monotonicity and regularization conditions. We propose a convexity correction to escape saddle points, and we reconsider the intrinsic stochasticity of the online learning process to improve upon the formulas. We finally compare the performance of the developed solution with well-established training methods and show its viability as a training method for Deep Learning.
One-sentence Summary: Theory and application of the stochastic second-order Newton method: a closed-form solution to train feedforward neural networks.
Supplementary Material: zip
9 Replies

Loading