Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Abstract: Federated ensemble distillation addresses client heterogeneity by generating pseudo-labels for an unlabeled server dataset based on client predictions and training the server model using the pseudo-labeled dataset. The unlabeled server dataset can either be pre-existing or generated through a data-free approach. The effectiveness of this approach critically depends on the method of assigning weights to client predictions when creating pseudo-labels, especially in highly heterogeneous settings. Inspired by theoretical results from GANs, we propose a provably near-optimal weighting method that leverages client discriminators trained with a server-distributed generator and local datasets. Our experiments on various image classification tasks demonstrate that the proposed method significantly outperforms baselines. Furthermore, we show that the additional communication cost, client-side privacy leakage, and client-side computational overhead introduced by our method are negligible, both in scenarios with and without a pre-existing server dataset.
Lay Summary: In federated learning, multiple devices (clients) collaboratively train a machine learning model without sharing their private data. However, differences in data across devices—called heterogeneity—can hurt the overall model performance. One promising solution is federated ensemble distillation, which allows a central server to learn from client predictions by generating “pseudo-labels” for an unlabeled dataset. This study introduces a smarter way to combine the predictions from different clients by assigning more reliable weights to each prediction. Inspired by insights from generative adversarial networks (GANs), the proposed approach finds nearly the best possible weighting strategy, even when the clients’ data are very different. Experiments on image classification tasks show that this technique significantly improves performance over existing methods. Importantly, it does so without adding much communication cost, privacy risk, or extra work for each client—even when the server starts without any dataset of its own.
Link To Code: https://github.com/pupiu45/FedGO
Primary Area: Optimization->Large Scale, Parallel and Distributed
Keywords: Federated learning, ensemble distillation, data heterogeneity, generative adversarial network
Submission Number: 15561
Loading