Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Boosting Gradient-based Optimizers for Asynchronous Parallelism
Shuai Li, Yi Ren, Dongchang Xu, Lin Guo, Hang Xiang, Di Zhang, Jinhui Li
Feb 07, 2018 (modified: Feb 11, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:Stochastic gradient descent methods have been broadly used in training deep neural network models. However, the classic approaches may suffer from gradient delay and thus perturb the training under asynchronous parallelism. In this paper, we present an approach tackling this challenge by adaptively adjusting the size of each optimizing step. We demonstrate that our approach significantly boost SGD, AdaGrad and Momentum optimizers for two very different tasks: image classification and click through rate prediction.
Enter your feedback below and we'll get back to you as soon as possible.