L-SR1: A Second Order Optimization Method for Deep Learning

Vivek Ramamurthy, Nigel Duffy

Nov 04, 2016 (modified: Jan 14, 2017) ICLR 2017 conference submission readers: everyone
  • Abstract: We describe L-SR1, a new second order method to train deep neural networks. Second order methods hold great promise for distributed training of deep networks. Unfortunately, they have not proven practical. Two significant barriers to their success are inappropriate handling of saddle points, and poor conditioning of the Hessian. L-SR1 is a practical second order method that addresses these concerns. We provide experimental results showing that L-SR1 performs at least as well as Nesterov's Accelerated Gradient Descent, on the MNIST and CIFAR10 datasets. For the CIFAR10 dataset, we see competitive performance on shallow networks like LeNet5, as well as on deeper networks like residual networks. Furthermore, we perform an experimental analysis of L-SR1 with respect to its hyper-parameters to gain greater intuition. Finally, we outline the potential usefulness of L-SR1 in distributed training of deep neural networks.
  • TL;DR: We describe L-SR1, a new second order method to train deep neural networks.
  • Conflicts: sentient.ai

Loading