Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix

Sebastien Arnold; Chunming Wang

Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix

Sebastien Arnold, Chunming Wang

16 Apr 2024 (modified: 15 Mar 2017)ICLR 2017 workshop submissionReaders: Everyone

Abstract: We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients will provide further information on the loss surface.

TL;DR: We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime.

Keywords: Deep learning, Optimization

Conflicts: usc.edu, math.usc.edu

3 Replies

Loading