Effective and efficient conditional contrast for data-free knowledge distillation with low memory

Published: 2025, Last Modified: 10 Nov 2025J. Supercomput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Data-free knowledge distillation has recently gained significant attention in the field of model compression, as it enables knowledge transfer from a trained teacher model to a smaller student model without requiring original training data. Current methods often utilize generative adversarial networks (GANs) to synthesize fake samples, but this approach introduces two main issues. First, mode collapse leads to instances lacking diversity for downstream tasks. Second, inefficient instance synthesis makes existing methods too time-consuming and thus difficult to adapt to large-scale datasets. Finally, the increased memory footprint makes deployment difficult. In this paper, we propose a novel paradigm called conditional contrast for data-free knowledge distillation (CC-DFKD), which integrates conditional generative adversarial network (CGAN) and contrastive learning. CGAN synthesizes class-specific diverse images to address the diversity challenge, while contrastive learning enriches the student model’s feature representations to tackle the reality challenge. Additionally, compared with the recent work, simplification of the distillation loss reduces instance generation time and memory usage during operation, achieving significant speed improvements (half an hour to nine hours reduction) and lower GPU memory usage (2000–5000 MB reduction). Empirical results across multiple datasets validate CC-DFKD’s effectiveness and efficiency under low-memory conditions. Code is available at: https://github.com/jcynxu/CC-DFKD.
Loading