Sampling to Distill: Knowledge Transfer from Open-World Data

Published: 20 Jul 2024, Last Modified: 04 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Data-Free Knowledge Distillation (DFKD) is a novel task that aims to train high-performance student models using only the pre-trained teacher network without original training data. Most of the existing DFKD methods rely heavily on additional generation modules to synthesize the substitution data resulting in high computational costs and ignoring the massive amounts of easily accessible, low-cost, unlabeled open-world data. Meanwhile, existing methods ignore the domain shift issue between the substitution data and the original data, resulting in knowledge from teachers not always trustworthy and structured knowledge from data becoming a crucial supplement. To tackle the issue, we propose a novel Open-world Data Sampling Distillation (ODSD) method for the DFKD task without the redundant generation process. First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module and introduce a low-noise representation to alleviate the domain shift issue. Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation. Extensive experiments on CIFAR-10, CIFAR-100, NYUv2, and ImageNet show that our ODSD method achieves state-of-the-art performance with lower FLOPs and parameters. Especially, we improve 1.50\%-9.59\% accuracy on the ImageNet dataset and avoid training the separate generator for each class.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work considers a real-world application scenario of deep learning techniques in the face of data privacy. Our method omits the past redundant and expensive generation module computational costs in the Data-Free Knowledge Distillation (DFKD) task by leveraging easily accessible, low-cost, unlabeled open-world data. In the field of multimedia applications, issues such as data privacy and missing modalities also need to be solved urgently. Many competitive methods supplement and enrich the training data with the help of customized generation modules. In contrast, due to lower cost, faster speed, and competitive performance, our method provides a novel solution idea and option for the multimedia community's application deployment. In addition, multimodal large foundation models often rely on large-scale training data that is difficult to transmit at scale. We hope to apply the proposed technique to distill multimodal foundation models to mobile devices while overcoming the strong dependence on data.
Supplementary Material: zip
Submission Number: 1260
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview