Abstract: While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to *local virtual data* and use federated gradient matching to distill *global virtual data* that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms *state-of-the-art* heterogeneous FL algorithms under various settings.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=lEgqjLhdOz
Changes Since Last Submission: In the previous submission, we accidentally overwrote the font, which resulted in incorrect format.
We have corrected the format issue and made sure the submission is within 12 pages.
We apologize for any inconvenience and thank you for reviewing our paper re-submission!
Assigned Action Editor: ~Pin-Yu_Chen1
Submission Number: 3581
Loading