Delayed Gradient Averaging: Tolerate the Communication Latency for Federated LearningDownload PDF

Published: 09 Nov 2021, Last Modified: 31 Oct 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: distributed optimization, distributed training, federated learning, latency
TL;DR: A method optimizes the training speed for high latency network
Abstract: Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data. Since the data is distributed across many edge devices through wireless / long-distance connections, federated learning suffers from inevitable high communication latency. However, the latency issues are undermined in the current literature [15] and existing approaches suchas FedAvg [27] become less efficient when the latency increases. To over comethe problem, we propose \textbf{D}elayed \textbf{G}radient \textbf{A}veraging (DGA), which delays the averaging step to improve efficiency and allows local computation in parallel tocommunication. We theoretically prove that DGA attains a similar convergence rate as FedAvg, and empirically show that our algorithm can tolerate high network latency without compromising accuracy. Specifically, we benchmark the training speed on various vision (CIFAR, ImageNet) and language tasks (Shakespeare),with both IID and non-IID partitions, and show DGA can bring 2.55$\times$ to 4.07$\times$ speedup. Moreover, we built a 16-node Raspberry Pi cluster and show that DGA can consistently speed up real-world federated learning applications.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
22 Replies