Harnessing Heterogeneity: Improving Convergence Through Partial Variance Control in Federated Learning

Harnessing Heterogeneity: Improving Convergence Through Partial Variance Control in Federated Learning

TMLR Paper3910 Authors

08 Jan 2025 (modified: 25 Apr 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated Learning (FL) has emerged as a promising paradigm for collaborative model training without sharing local data. However, a significant challenge in FL arises from the heterogeneous data distributions across participating clients. This heterogeneity leads to highly variable gradient norms in the model's final layers, resulting in poor generalization, slower convergence, and reduced robustness of the global model. To address these issues, we propose a novel technique that incorporates a gradient penalty term into partial variance control. Our method enables diverse representation learning from heterogeneous client data in the initial layers while modifying standard SGD in the final layers. This approach reduces variance in the classification layers, aligns gradients, and mitigates the effects of data heterogeneity. Through theoretical analysis, we establish convergence rate bounds for the proposed algorithm, demonstrating its potential for competitive convergence compared to current FL methods in highly heterogeneous data settings. Empirical evaluations on five benchmark datasets validate our approach, showing enhanced performance and faster convergence over state-of-the-art baselines across various levels of data heterogeneity.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=jCdNX32VGi

Changes Since Last Submission: We thank the Action Editor for giving us the opportunity to resubmit our paper. We also thank all the reviewers for the appreciation about the novelty and contributions of our work. We have updated the manuscript based on the reviewers suggestions and made the changes. $\textbf{Changes since the last TMLR submission:}$ We have revised the convergence proof in Section 6 of the appendix, following suggestions from reviewer Zxic, and addressed previous limitations to ensure it supports all the cases. Additionally, both proofs in Subsection 6.0.1 have been detailed in a step-by-step manner to enhance clarity and facilitate understanding for our readers.Please review the updates and feel free to suggest any improvements. We have used the experimental setting mentioned in the FedCorr paper [1] for all our experiments (refer to subsection 5.1). However, in the discussion period we mentioned wrong papers which lead to initial confusion, however in the subsequent response we apologize for the mistake and cite the correct paper. $\textbf{Here is the summary of the reviews and responses in the first submission:}$ Based on the suggestions from reviewer $\textbf{cq6m}$, we have validated our approach with additional experiments using the large and challenging datasets like Tiny-ImageNet dataset. The results of these experiments are presented in Table 5 of the revised manuscript, and further details are added in the Section 8. We have also conducted additional experiments on popular and complex models like ResNet18 and ViT backbones. The results are presented in Table 5 and discussed in Section 8 of the updated manuscript. To assess the generalizability of our approach, we conducted additional experiments on a natural language understanding task using the popular QQP benchmark dataset. The results, detailed in Table 5 and Section 8 of the manuscript, indicate that our proposed model outperforms all baselines. As per your recommendation of reviewer $\textbf{eMb4}$, we have moved the statements of the theorems into the main paper and included an informal presentation (refer to Section 4). Additionally, we have incorporated a discussion that links these guarantees to those of existing algorithms to provide further context. We have carefully revised the proofs to provide a more transparent and reader-friendly explanation. Additionally, we now reference all assumptions and properties by their respective equation or assumption numbers, as reviewer eMb4 suggested. We hope these adjustments will allow readers to follow the derivations more easily. We have also addressed all the minor suggestions in the ‘Requested Changes’ section and updated the manuscript accordingly. We have addressed all the minor suggestions which reviewer $\textbf{Zxic}$ pointed out and updated the manuscript accordingly. Based on the suggestions of reviewer Zxic, we conducted additional experiments on a natural language understanding task using the popular QQP benchmark dataset. The results are added in Table 5 and Section 8 of the updated manuscript. We have clarified and updated the manuscript to note that the experimental procedures in our paper align with those outlined in the FedCorr paper [1]. Reference: 1. Xu, Jingyi, et al. "Fedcorr: Multi-stage federated learning for label noise correction." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

Assigned Action Editor: ~Grigorios_Chrysos1

Submission Number: 3910

Loading