A Simple Data Augmentation for Feature Distribution Skewed Federated Learning

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: federated learning, data heterogeneity, data augmentation.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner and ensures privacy protection. However, its performance inevitably degrades, while suffering from data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is a common setting in real-world applications. The main challenge of this scenario is feature shift, which is caused by the different underlying distributions of local datasets. Although the previous attempts achieved impressive progress, few studies pay attention to the data itself, i.e., the root of this issue. To this end, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift problem. To achieve this goal, we propose a simple yet remarkably effective data augmentation method, namely FedRDN, for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. Then, our method can effectively improve the generalization of features, and thereby mitigate the feature shift problem. Moreover, our FedRDN is a plug-and-play component, which can be seamlessly integrated into the data augmentation flow with only a few lines of code. Extensive experiments on several datasets show that the performance of various representative FL works can be further improved by integrating our FedRDN, which demonstrates its strong scalability and generalizability. The source code will be released.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4875
Loading