Keywords: Privacy-preserving, Knowledge distillation
TL;DR: We theoretically analyze sparse logits can make knowledge distillation fail, and propose Stingy Teacher with just a simple manipulation of the output logits to prevent unauthorized cloning through knowledge distillation
Abstract: Knowledge distillation (KD) aims to transfer the power of pre-trained teacher models to (more lightweight) student models. However, KD also poses the risk of intellectual properties (IPs) leakage of teacher models. Even if the teacher model is released as a black box, it can still be cloned through KD by imitating input-output behaviors. To address this unwanted effect of KD, the concept of Nasty Teacher was proposed recently. It is a special network that achieves nearly the same accuracy as a normal one, but significantly degrades the accuracy of student models trying to imitate it. Previous work builds the nasty teacher by retraining a new model and distorting its output distribution from the normal one via an adversarial loss. With this design, the ``nasty" teacher tends to produce sparse and noisy logits. However, it is unclear why the distorted distribution is catastrophic to the student model, as the nasty logits still maintain the correct labels. In this paper, we provide a theoretical analysis of why the sparsity of logits is key to Nasty Teacher. Furthermore, we propose an ideal version of nasty teacher to prevent imitation through KD, named $\textit{Stingy Teacher}$. The Stingy Teacher directly manipulates the logits of a standard pre-trained network by maintaining the values for a small subset of classes while zeroing out the rest. Extensive experiments on several datasets demonstrate that stingy teacher is more catastrophic to student models on both standard KD and data-free KD. Source code and trained model can be found at https://github.com/HowieMa/stingy-teacher.
0 Replies
Loading