Homogenizing Non-IID Datasets via In-Distribution Knowledge Distillation for Decentralized Learning

TMLR Paper2281 Authors

22 Feb 2024 (modified: 11 Mar 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. One of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. Data heterogeneity results in slow and unstable global convergence and therefore poor generalization performance. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced due to misalignment between the private and public data distribution. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zachary_B._Charles1
Submission Number: 2281
Loading