Self-distillation for diffusion models

Damion Woods; Peter Bloem

Self-distillation for diffusion models

Damion Woods, Peter Bloem

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Generative models, Diffusion, Self-Distillation, Denoising Diffusion Models, DDIM

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We introduce a method for performing self-distillation in diffusion models and show several benefits over teacher/student distillation.

Abstract: In recent years, diffusion models have demonstrated powerful generative capabilities. As they continue to grow in both ability and complexity, performance optimization becomes more relevant. Knowledge Distillation (KD), where the output from a pre-trained teacher model is used to train a smaller student model, has been shown to greatly reduce the number of network evaluations required, while retaining comparable image sample quality. KD is especially useful in diffusion, because it can be used not only to distill a large model into a small one, but also to distill a large number of denoising iterations into a small one. Here, we show that a form of _self-distillation_—training a subnetwork to mimic the output of the larger network, effectively distilling a network into itself—can improve distillation in diffusion models. We show first that when a pre-trained teacher model is distilled to a student network, we can turn this into a self-distillation procedure by unifying the teacher and the student. Our results indicate that this leads to faster convergence for a competitive sample quality. Additionally, we show in small-scale experiments that when diffusion models are trained from scratch, adding a self-distillation term to the loss can, in specific cases, help the model to convergence produce high-quality samples more quickly.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2569

Loading