Training an Anti-KD Model that Cannot Teach Students via Similarity Disruption

Published: 2025, Last Modified: 04 Oct 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Knowledge Distillation (KD) aims to enhance the performance of student models by transferring knowledge from teacher models. While reaping the benefits of KD, the intellectual property risks associated with it cannot be ignored. Even if models are released without training data or provided as a service, potential adversaries can still clone the target model using KD. To mitigate the risks, some researchers propose training the anti-KD model that cannot teach student models. However, we find existing methods cannot defend against representation-based KD. To address the knowledge leakage from representations, we introduce Similarity Disruption (SD). SD increases the distance between the representation similarity matrices of our anti-KD model and the normal model, thereby reducing the effective information in the representation space. Extensive experiments demonstrate the proposed method can effectively defend against representation-based KD.
Loading