Diverse Image Priors for Black-box Data-free Knowledge Distillation

TMLR Paper6347 Authors

31 Oct 2025 (modified: 06 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Knowledge distillation (KD) is a well-known technique for effectively transferring knowledge from an expert network (teacher) to a smaller network (student) with little sacrifice in performance. However, most KD methods require extensive access to the teacher or even its original training set, which are unachievable due to intellectual property or security concerns. These challenges have inspired black-box data-free KD, in which only the teacher's top-1 predictions and no real data are available. While recent approaches tend to synthetic data, they largely overlook data diversity, which is crucial for effective knowledge transfer. We propose Diverse Image Priors Knowledge Distillation (DIP-KD) to address this problem. We first synthesize image priors --- semantically diverse synthetic images, then further optimize them to a diversity objective via contrastive learning, and finally extract soft knowledge to distill the student. We achieve state-of-the-art KD performance for the black-box data-free settings on eight image benchmarks. This is backed by our deep analysis, showing that data diversity is effectively improved, and how it facilitates KD performance. We publish the source code at https://osf.io/5mry8/?view_only=dee9e8fbcd114c34b45aa958a3aa32fa.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Farzan_Farnia1
Submission Number: 6347
Loading