Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance

TMLR Paper3267 Authors

31 Aug 2024 (modified: 25 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data. In FL, a server typically selects a subset of clients for each training round to optimize resource usage. Central to this process is the technique of unbiased client sampling, which ensures a representative selection of clients. Current methods primarily utilize a random sampling procedure which, despite its effectiveness, achieves suboptimal efficiency owing to the loose upper bound caused by the sampling variance. In this work, by adopting an independent sampling procedure, we propose a federated optimization framework focused on adaptive unbiased client sampling, improving the convergence rate via an online variance reduction strategy. In particular, we present the first adaptive client sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a linear speed-up on the regret bound $\tilde{\mathcal{O}}\big(N^{\frac{1}{3}}T^{\frac{2}{3}}/K^{\frac{4}{3}}\big)$ within a set communication budget $K$. Empirical studies indicate that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=CKQ3sMt4tx
Changes Since Last Submission: Thanks to AC and all the reviewers for their efforts in reviewing our work. Here is the summarization of the changes: - We carefully refine paper writing to make sure theoretical results are connected. Besides, we also revised some concept descriptions for better understanding. And, we also reorganized the experiment section for a better paper-reading experience. - We have refined the convergence analysis of FedAvg with arbitrary client sampling. Moreover, our analysis ensures that the new convergence rate matches the previous work's "optimal client sampling". - Relying on the new convergence analysis, we provide end-to-end convergence guarantees (FedAvg + K-ViB) at the end of Section 5. - We conducted 2 additional natural language processing tasks. Each task involved three different levels of data distribution. Most importantly, new experiments involve large models using the popular transformer and Bert architecture. And, these large models are trained on large-scale datasets AGNews and CCNews, showing the proposed method's ability in real-world applications. - Explanation for recent updates: we revised the convergence analysis and fixed a few constant errors. Our main conclusions do not change.
Assigned Action Editor: ~Sebastian_U_Stich1
Submission Number: 3267
Loading