Generative Distribution Distillation

11 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Learning, Distillation
Abstract: In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the Generative Distribution Distillation (GenDD). A naive GenDD encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a Split Tokenization (SplitTok) strategy, achieving stable and effective unsupervised KD. Additionally, we develop the Distribution Contraction technique to integrate label supervision into the reconstruction objective. Our theoretical proof demonstrates that GenDD with Distribution Contraction serves as a gradient-level surrogate for multi-task learning, realizing efficient supervised training without explicit classification loss on multi-step sampling image representations. To evaluate the effectiveness of our method, we conduct experiments on balanced, imbalanced, and unlabeled data. Experimental results show that GenDD performs competitively in the unsupervised setting, significantly surpassing the KL baseline by 16.29\% on the ImageNet validation set. With label supervision, our ResNet-50 achieves 82.28\% top-1 accuracy on ImageNet in 600 epochs of training, establishing a new state-of-the-art. Code is available in the Appendix.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4200
Loading