Abstract: Federated Distillation (FD) has recently attracted increasing attention for its efficiency in aggregating multiple diverse local models trained from statistically heterogeneous data of distributed clients. Existing FD methods
generally treat these models equally by merely computing
the average of their output soft predictions for some given
input distillation sample, which does not take the diversity
across all local models into account, thus leading to degraded performance of the aggregated model, especially
when some local models learn little knowledge about the
sample. In this paper, we propose a new perspective that
treats the local data in each client as a specific domain and
design a novel domain knowledge aware federated distillation method, dubbed DaFKD, that can discern the importance of each model to the distillation sample, and thus is
able to optimize the ensemble of soft predictions from diverse models. Specifically, we employ a domain discriminator for each client, which is trained to identify the correlation factor between the sample and the corresponding domain. Then, to facilitate the training of the domain discriminator while saving communication costs, we propose sharing its partial parameters with the classification model. Extensive experiments on various datasets and settings show
that the proposed method can improve the model accuracy
by up to 6.02% compared to state-of-the-art baselines.
Loading