Dynamic Delay Based Cyclic Gradient Update Method for Distributed Training

Wenhui Hu, Peng Wang, Qigang Wang, Zhengdong Zhou, Hui Xiang, Mei Li, Zhongchao Shi

2018 (modified: 03 Nov 2022)PRCV (3) 2018Readers: Everyone

Abstract: Distributed training performance is constrained by two factors. One is the communication overhead between parameter servers and workers. The other is the unbalanced computing powers across workers. We propose a dynamic delay based cyclic gradient update method, which allows workers to push gradients to parameter servers in a round-robin order with dynamic delays. Stale gradient information is accumulated locally in each worker. When a worker obtains the token to update gradients, the accumulated gradients are pushed to parameter servers. Experiments show that, compared with the previous synchronous and cyclic gradient update methods, the dynamic delay cyclic method converges to the same accuracy at a faster speed.

0 Replies