Dual-Distilled Heterogeneous Federated Learning with Adaptive Margins for Trainable Global Prototypes

Fatema Siddika, Md Anwar Hossen, Wensheng Zhang, Anuj Sharma, Juan Pablo Munoz, Ali Jannesari

Published: 18 May 2026, Last Modified: 23 Apr 2026The IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid)EveryoneCC BY 4.0

Abstract: Heterogeneous Federated Learning (HFL) has gained significant attention for its capacity to handle both model and data heterogeneity across clients. Prototype-based HFL methods emerge as a promising solution to address statistical and model heterogeneity as well as privacy challenges, paving the way for new advancements in HFL research. These methods focus on sharing class-representative prototypes among heterogeneous clients. However, aggregating these prototypes via standard weighted averaging often yields sub-optimal global knowledge. Specifically, the averaging approach induces a shrinking of the aggregated prototypes’ decision margins, thereby degrading model performance in scenarios with model heterogeneity and non-IID data distributions. We propose FedProtoKD in a Heterogeneous Federated Learning setting, leveraging clients’ logits and prototype feature representations to improve system performance via an enhanced dual-knowledge distillation mechanism. The proposed framework aims to resolve the prototype margin-shrinking problem using a contrastive learning-based trainable prototype server by leveraging a class-wise adaptive prototype margin. Furthermore, the framework assesses the importance of public samples by measuring the closeness of each sample’s prototype to its class representative, thereby enhancing learning performance. FedProtoKD improved test accuracy by an average of 1.13% and up to 34.13% across various settings, significantly outperforming existing state-of-the-art HFL methods.