## Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

May 21, 2021 (edited Oct 26, 2021)NeurIPS 2021 PosterReaders: Everyone
• Keywords: distributed optimization, distributed training, federated learning, latency
• TL;DR: A method optimizes the training speed for high latency network
• Abstract: Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data. Since the data is distributed across many edge devices through wireless / long-distance connections, federated learning suffers from inevitable high communication latency. However, the latency issues are undermined in the current literature [15] and existing approaches suchas FedAvg [27] become less efficient when the latency increases. To over comethe problem, we propose \textbf{D}elayed \textbf{G}radient \textbf{A}veraging (DGA), which delays the averaging step to improve efficiency and allows local computation in parallel tocommunication. We theoretically prove that DGA attains a similar convergence rate as FedAvg, and empirically show that our algorithm can tolerate high network latency without compromising accuracy. Specifically, we benchmark the training speed on various vision (CIFAR, ImageNet) and language tasks (Shakespeare),with both IID and non-IID partitions, and show DGA can bring 2.55$\times$ to 4.07$\times$ speedup. Moreover, we built a 16-node Raspberry Pi cluster and show that DGA can consistently speed up real-world federated learning applications.
• Supplementary Material: pdf
• Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
22 Replies