L-SR1: A Second Order Optimization Method for Deep LearningDownload PDF

18 Apr 2024 (modified: 21 Jul 2022)Submitted to ICLR 2017Readers: Everyone
Abstract: We describe L-SR1, a new second order method to train deep neural networks. Second order methods hold great promise for distributed training of deep networks. Unfortunately, they have not proven practical. Two significant barriers to their success are inappropriate handling of saddle points, and poor conditioning of the Hessian. L-SR1 is a practical second order method that addresses these concerns. We provide experimental results showing that L-SR1 performs at least as well as Nesterov's Accelerated Gradient Descent, on the MNIST and CIFAR10 datasets. For the CIFAR10 dataset, we see competitive performance on shallow networks like LeNet5, as well as on deeper networks like residual networks. Furthermore, we perform an experimental analysis of L-SR1 with respect to its hyper-parameters to gain greater intuition. Finally, we outline the potential usefulness of L-SR1 in distributed training of deep neural networks.
TL;DR: We describe L-SR1, a new second order method to train deep neural networks.
Conflicts: sentient.ai
15 Replies

Loading