Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
L-SR1: A Second Order Optimization Method for Deep Learning
Vivek Ramamurthy, Nigel Duffy
Nov 04, 2016 (modified: Jan 14, 2017)ICLR 2017 conference submissionreaders: everyone
Abstract:We describe L-SR1, a new second order method to train deep neural networks. Second order methods hold great promise for distributed training of deep networks. Unfortunately, they have not proven practical. Two significant barriers to their success are inappropriate handling of saddle points, and poor conditioning of the Hessian. L-SR1 is a practical second order method that addresses these concerns. We provide experimental results showing that L-SR1 performs at least as well as Nesterov's Accelerated Gradient Descent, on the MNIST and CIFAR10 datasets. For the CIFAR10 dataset, we see competitive performance on shallow networks like LeNet5, as well as on deeper networks like residual networks. Furthermore, we perform an experimental analysis of L-SR1 with respect to its hyper-parameters to gain greater intuition. Finally, we outline the potential usefulness of L-SR1 in distributed training of deep neural networks.
TL;DR:We describe L-SR1, a new second order method to train deep neural networks.
Enter your feedback below and we'll get back to you as soon as possible.