Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

M Yashwanth; Gaurav Kumar Nayak; Arya Singh; Yogesh Simmhan; Anirban Chakraborty

Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

M Yashwanth, Gaurav Kumar Nayak, Arya Singh, Yogesh Simmhan, Anirban Chakraborty

Published: 20 Dec 2024, Last Modified: 20 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid label distributions across clients, FL suffers from the `client-drift’ problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to each client's training data based on the global model's prediction entropy and the client-data label distribution. We show in this paper that our proposed regularization (ASD) can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how incorporation of ASD regularizer leads to reduction in client-drift and empirically justify the generalization ability of the trained model. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance when the proposed regularizer is combined with popular FL methods. The code is provided as supplementary material.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: All the suggested changes have been made in the camera ready version. Below we summarize all the changes. 1) Added the requested changes in Figure 1 by demonstrating the impact of ASD for two clients and also for FedNTD and FendNTD+ASD 2) The referencing style has been updated. 3) It Clearly mentioned that the proposed method deals with label heterogeneity in the abstract and introduction. Clarification on hyperparameter ($\lambda$) in Sec A.3 is provided. 4) Updated the captions of Figures 4,5 and 6 and fixed a few typos in Tables (1) and (3). 5) Updated the link to the code in the main paper. Rectified miscellaneous grammatical, typographical, and other minor errors.

Video: https://youtu.be/6ld6AKLsmY0

Code: https://github.com/vcl-iisc/fed-adaptive-self-distillation

Supplementary Material: zip

Assigned Action Editor: ~Novi_Quadrianto1

Submission Number: 2677

Loading