Homogenizing Non-IID Datasets via In-Distribution Knowledge Distillation for Decentralized Learning

Published: 11 Jul 2024, Last Modified: 11 Jul 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. One of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. Data heterogeneity results in slow and unstable global convergence and therefore poor generalization performance. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced due to misalignment between the private and public data distribution. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have updated the paper based on reviews, specifically, we have added additional results with more baselines, results for IDKD with node failures to study robustness, updated ablation study to identify the effect of each component of IDKD, more results on the study of the size of the public dataset needed. We also provide clarifications that IDKD targets a global solution when using decentralized learning, and discussion on applicability to federated learning.
Video: https://youtu.be/I-den-2TNyk
Code: https://github.com/DeepakTatachar/IDKD
Supplementary Material: zip
Assigned Action Editor: ~Zachary_B._Charles1
Submission Number: 2281