Privacy-Preserving Learning via Data and Knowledge Distillation

Fahim Faisal, Carson K. Leung, Noman Mohammed, Yang Wang

Published: 01 Jan 2023, Last Modified: 22 Jun 2025DSAA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the current era of data science, deep learning, computer vision and image analysis have become ubiquitous across various sectors, ranging from government agencies and large corporations to small end devices, due to their ability to simplify people’s lives. However, the widespread use of sensitive image data and the high memorization capacity of deep learning present significant privacy risks. Now, a simple Google search can yield numerous images of a person, and the knowledge that a specific patient’s record was utilized for training a specific model associated with a disease may reveal the patient’s ailment, potentially leading to membership privacy leakage and other advanced attacks in the future. Furthermore, these unprotected models may also suffer from poor generalization due to this overfitting to train data. Previous state-of-the-art methods like differential privacy (DP) and regularizer-based defenses compromised functionality, i.e., task accuracy, to preserve privacy. Such an imbalanced trade-off raises concerns about the practicability of such defenses. Other existing knowledge-transfer-based methods either reuse private data or require more public data, which could compromise privacy and may not be viable in certain domains. To address these challenges, where membership privacy is of utmost importance and utility cannot be compromised, we propose a novel collaborative distillation approach that transfers the private model’s knowledge based on a minimal amount of distilled synthetic data, leading to a compact private model in an end-to-end fashion. Empirically, our proposed method guarantees superior performance compared to most advanced models currently in use, increasing utility by almost 8%, 34%, and 6% for CIFAR-10, CIFAR-100, and MNIST, respectively. The utility resembles non-private counterparts almost closely while maintaining a respectable level of membership privacy leakage of 50-53.5%, despite employing a smaller model with 50% fewer parameters.