Keywords: Federated learning, Federated distillation, Logit aggregation, Communication efficiency, Data heterogeneity, Knowledge distillation.
TL;DR: We study logit-based federated distillation and show that uncertainty weighting and meta-model aggregation improve robustness to client heterogeneity compared to simple averaging.
Abstract: Federated learning (FL) usually shares model weights or gradients, which is costly for large models. Logit-based FL reduces this cost by sharing only logits computed on a public proxy dataset. However, aggregating information from heterogeneous clients is still challenging. This paper studies this problem, introduces and compares three logit aggregation methods: simple averaging, uncertainty-weighted averaging, and a learned meta-aggregator. Evaluated on MNIST and CIFAR-10, these methods reduce communication overhead, improve robustness under non-IID data, and achieve accuracy competitive with centralized training.
Submission Number: 127
Loading