Boosting Gradient-based Optimizers for Asynchronous ParallelismDownload PDF

07 Feb 2018 (modified: 11 Feb 2018)ICLR 2018 Workshop SubmissionReaders: Everyone
Abstract: Stochastic gradient descent methods have been broadly used in training deep neural network models. However, the classic approaches may suffer from gradient delay and thus perturb the training under asynchronous parallelism. In this paper, we present an approach tackling this challenge by adaptively adjusting the size of each optimizing step. We demonstrate that our approach significantly boost SGD, AdaGrad and Momentum optimizers for two very different tasks: image classification and click through rate prediction.
4 Replies

Loading