Abstract: Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ synthesized training data are prone to the limitations of inadequate diversity and discrepancies in distribution between the synthesized and original datasets. To address these challenges, this paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA). Specifically, we revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations. Furthermore, to mitigate excessive deviation in the embedding space, we introduce an image filtering technique grounded in cosine similarity to maintain fidelity during the knowledge distillation process. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method across various teacher-student network configurations, outperforming the contemporary state-of-the-art DFKD methods.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Media Interpretation, [Generation] Generative Multimedia
Relevance To Conference: This paper introduces DDA (Diverse Diffusion Augmentation), a novel data-free knowledge distillation (DFKD) technique that extends the conventional data synthesis in DFKD with data augmentation to further enhance data diversity, thus establishing a new paradigm for DFKD. Diffusion models are used to augment synthetic images, which further facilitate lightweight model to complete image classification task. Additionally, to address the issue of redundancy in data augmentation, we introduce a image filtering technique based on cosine similarity, which eliminates augmented images exhibiting significant deviations during augmentation, resulting in improved performance. As a result, our DDA method is highly relevant to multimedia vision and language, somewhat related to media interpretation and generative multimedia, thereby fully aligns with the scope of the conference’s submission.
Submission Number: 1613
Loading