FLea: Improving federated learning on scarce and label-skewed data via privacy-preserving feature augmentation

22 Sept 2023 (modified: 18 Feb 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Federated learning, label skew, data scaricity, classification
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present the first study on a common but under-explored scenario in FL, where clients possess uniformly limited and highly skewed data.
Abstract: Learning a global model by abstracting the knowledge, distributed across multiple clients, without aggregating the raw data is the primary goal of Federated Learning (FL). Typically, this works in rounds alternating between parallel local training at several clients, followed by model aggregation at a server. We found that existing FL methods under-perform when local datasets are small and present severe label skew as these lead to over-fitting and local model bias. This is a realistic setting in many real-world applications. To address the problem, we propose FLea, a unified framework that tackles over-fitting and local bias by encouraging clients to exchange privacy-protected features to aid local training. The features refer to activations from an intermediate layer of the model, which are obfuscated before being shared with other clients to protect sensitive information in the data. FLea leverages a novel way of combining local and shared features as augmentations to enhance local model learning. Our extensive experiments demonstrate that FLea outperforms the start-of-the-art FL methods, sharing only model parameters, by up to $17.6\%$, and also outperforms the FL methods that share data augmentations by up to $6.3\%$, while reducing the privacy vulnerability associated with shared data augmentations.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5440
Loading