Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation

Chun-Yin Huang; Ruinan Jin; Can Zhao; Daguang Xu; Xiaoxiao Li

Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation

Chun-Yin Huang, Ruinan Jin, Can Zhao, Daguang Xu, Xiaoxiao Li

Published: 16 Jan 2025, Last Modified: 16 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to *local virtual data* and use federated gradient matching to distill *global virtual data* that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms *state-of-the-art* heterogeneous FL algorithms under various settings.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=lEgqjLhdOz

Changes Since Last Submission: We've amended and added the changes recommended by the reviewers. Since we've moved the algorithm box to the main text, the page length is now 14.

Code: https://github.com/ubc-tea/FedLGD

Supplementary Material: pdf

Assigned Action Editor: ~Pin-Yu_Chen1

Submission Number: 3581

Loading