When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning

Published: 18 May 2023, Last Modified: 18 May 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate, which is supposed to minimize the Euclidean distance between the aggregated gradient given currently sampled clients and that if all clients could participate in the current round. We show that our proposed indicator can effectively reflect the merged data distribution of sampled clients, thus we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method in various settings. Our code is available at https://github.com/lancopku/FedGLAD.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **First round revision:** (1) Add empirical results in Appendix C to validate our assumption on the scale component. (2) Add results in Appendix O to show the effect of using $B^{t}$ to estimate unknown $GSI^{t}_{c}$ . (3) Add results in Appendix P to visualize the patterns of adapted server learning rates. (4) Display the results of combing our method with a weighted averaging-based IDA in Appendix Q. (5) Explore the fairness of our method on each client in Appendix R. (6) Add results on more realistic federated datasets in Appendix S. **Camera ready revision:** (1) Revise the description in the abstract and main paper about the term "optimal server learning rate" to make it more clear. (2) Re-write the proposed $GSI$ in terms of variance in Eq. (10) to better interpret its meanings at the end of Section 3.2. (3) Add the results on StackOverflow in Appendix S to further strengthen the empirical effectiveness of our method. (4) Add the Acknowledgments section. Finally, We sincerely thank all the reviewers and the Action Editors for their great efforts on the reviewing process and their valuable suggestions on improving the paper.
Code: https://github.com/lancopku/FedGLAD
Supplementary Material: zip
Assigned Action Editor: ~Yaoliang_Yu1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 836