Communication Efficient Federated Learning over Wireless Channels

TMLR Paper2586 Authors

25 Apr 2024 (modified: 21 May 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large-scale federated learning (FL) over wireless multiple access channels (MACs) has emerged as a crucial learning paradigm with a wide range of applications. However, its widespread adoption is hindered by several major challenges, including limited bandwidth shared by many edge devices, noisy and erroneous wireless communications, and heterogeneous datasets with different distributions across edge devices. To overcome these fundamental challenges, we propose Federated Proximal Sketching (FPS), tailored towards band-limited wireless channels and handling data heterogeneity across edge devices. FPS uses a count sketch data structure to address the bandwidth bottleneck and enable efficient compression while maintaining accurate estimation of significant coordinates. Additionally, we modify the loss function in FPS such that it is equipped to deal with varying degrees of data heterogeneity. We establish the convergence of the FPS algorithm under mild technical conditions and characterize how the bias induced due to factors like data heterogeneity and noisy wireless channels play a role in the overall result. We complement the proposed theoretical framework with numerical experiments that demonstrate the stability, accuracy, and efficiency of FPS in comparison to state-of-the-art methods on both synthetic and real-world datasets. Overall, our results show that FPS is a promising solution to tackling the above challenges of FL over wireless MACs.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: This is a resubmission of our previous work addressing the previous comments and concerns. The previous submission can be found at: .The changes/updates are highlighted in blue in our current submission. The resubmission of our work addresses the following concerns: 1. The theoretical results corroborate the claim that our proposed algorithm can handle high levels of heterogeneity. We demonstrate our main result in Theorem 1 in a clean and concise manner, illustrating a trade-off between the size of the neighborhood to which our algorithm converges and the level of data heterogeneity across edge devices. In the remarks following the theorem, we discuss the slow convergence of our algorithm when data is highly heterogeneous across devices. Additionally, the role of different constants (such as $P_b, P_n, E, c, k$) in the convergence of our algorithm is explained in the remarks as well. 2. Our work is related to the FedProx algorithm (Li et al., 2020) through the usage of an additional proximal term in the loss function. While there have been related works (see Section 2.2 in the paper) where proximal terms have been utilized to mitigate the effects of noise during the training process, our empirical results suggest otherwise as FedProx performs poorly in noisy settings. Our empirical studies demonstrate that the usage of a proximal term in conjunction with the robust properties of the count-sketch data structure is what helps our proposed algorithm, FPS, to perform well in noisy band-limited settings. This narrative is reflected in our introduction section. 3. Empirically, we extend our simulations to include a popular ML dataset, MNIST. We discuss how our algorithm performs on this dataset in a band-limited noisy wireless channel setting. Additional results in a noise-free band-limited setting are shown in the appendix. 4. It is important to discuss when FPS can be advantageous over other methods or when other state-of-the-art (SOTA) methods could be preferred. To address this, we include a discussion paragraph in our experimental results section highlighting the merits and limitations of our approach.
Assigned Action Editor: ~Sebastian_U_Stich1
Submission Number: 2586