Communication-Efficient Loss Minimization over Heterogeneous Data with Federated Hierarchical Ensemble Aggregation via Distillation

Published: 10 Oct 2024, Last Modified: 07 Dec 2024NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated Learning, Heterogenous Data, Ensemble Distillation, Hierarchical Aggregation, SGD Convergence
Abstract: Distributed optimization through federated learning (FL) suffers from data heterogeneity particularly when the client datasets are highly imbalanced. Model aggregation via ensemble distillation is an effective solution to address this issue. However, there is no previous work on ensemble distillation in FL that considers hierarchical model aggregation, which is important for reducing communication overhead over a large network. In this work, we propose new methods to enable ensemble distillation for a hierarchical FL system. We develop a Federated Hierarchical Ensemble Aggregation via Distillation (FedHEAD) algorithm that performs ensemble distillation by reusing the clients' local data within each network sector of the hierarchy. We also extend it to FedHEAD+ so as to take advantage of reference data when it is available at the server. We provide theoretical analysis on FedHEAD and FedHEAD+, showing that under a wide range of conditions, our proposed schemes achieve faster convergence than existing non-hierarchical alternatives. Furthermore, extensive experiments over computer vision, natural language processing, and network traffic classification datasets show that the proposed schemes are robust towards hierarchical model aggregation in the network.
Submission Number: 62
Loading