Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix
Sebastien Arnold, Chunming Wang
Feb 17, 2017 (modified: Mar 15, 2017)ICLR 2017 workshop submissionreaders: everyone
Abstract:We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients will provide further information on the loss surface.
TL;DR:We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime.
Keywords:Deep learning, Optimization
Enter your feedback below and we'll get back to you as soon as possible.