Heterogeneous Federated Learning: A Dual Matching Dataset Distillation Approach

Zhengquan Luo; Xuhui Li; Changyou Chen; Dong An; Shengcai Liao; Peilin Zhao; Yunlong Wang; zhiqiang xu

Heterogeneous Federated Learning: A Dual Matching Dataset Distillation Approach

Zhengquan Luo, Xuhui Li, Changyou Chen, Dong An, Shengcai Liao, Peilin Zhao, Yunlong Wang, zhiqiang xu

13 Sept 2024 (modified: 22 Mar 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning; Dataset Distillation; Heterogeneous;

TL;DR: FedDualMatch enhances federated learning by efficiently addressing data heterogeneity through dual matching based dataset distillation to replace model communication, leading to effective, fast-converging, and privacy-preserving training.

Abstract: Federated Learning (FL) often struggles with error accumulation during local training, particularly on heterogeneous data, which hampers overall performance and convergence. While dataset distillation is commonly introduced to FL to enhance efficiency, our work finds that communicating distilled data instead of models can completely get rid of the error accumulation issue, albeit at the cost of exacerbating data heterogeneity across clients. To address the amplified heterogeneity due to distilled data, we propose a novel FL algorithm termed \textit{FedDualMatch}, which performs dual matching in the way that local distribution matching captures client data distributions while global gradient matching aligns gradients on the server. This dual approach enriches feature representations and enhances convergence stability. It proves effective for FL due to a bounded difference in the testing loss between optimal models trained on the aggregation of either distilled or original data across clients. At the same time, it can converge to within a bounded constant of the optimal model loss. Experiments on controlled heterogeneous dataset MNIST/CIFAR10 and naturally heterogeneous dataset Digital-Five/Office-Home demonstrate its advantages over the state-of-the-art methods that communicate either model or distilled data, in terms of accuracy and convergence. Notably, it maintains accuracy even when data heterogeneity significantly increases, underscoring its potential for practical applications.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 520

Loading