Abstract: Although the diffusion model has achieved remarkable performance in the field of image generation, its high inference delay hinders its wide application in edge devices with scarce computing resources. Therefore, many training-free sampling methods have been proposed to reduce the number of sampling steps required for diffusion models. However, they perform poorly under a very small number of sampling steps. Thanks to the emergence of knowledge distillation technology, the existing training scheme methods have achieved excellent results at very low step numbers. However, the current methods mainly focus on designing novel diffusion model sampling methods with knowledge distillation. How to transfer better diffusion knowledge from teacher models is a more valuable problem but rarely studied. Therefore, we propose Relational Diffusion Distillation (RDD), a novel distillation method tailored specifically for distilling diffusion models. Unlike existing methods that simply align teacher and student models at pixel level or feature distributions, our method introduces cross-sample relationship interaction during the distillation process and alleviates the memory constraints induced by multiple sample interactions. Our RDD significantly enhances the effectiveness of the progressive distillation framework within the diffusion model. Extensive experiments on several datasets (e.g., CIFAR-10 and ImageNet) demonstrate that our proposed RDD leads to 1.47 FID decrease and 256x speed-up, compared to state-of-the-art diffusion distillation methods. Our code will be attached to the supplementary material.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: In this paper, we focus on the generative multimedia diffusion model and propose Relational Diffusion Distillation. We solved the problem of insufficient information utilization in the current diffusion model distillation. Using this method, we can greatly improve the image generation quality of the distillation model with very low sampling steps. Our method further advances the application of knowledge distillation in diffusion models and promotes the application of diffusion models with low latency in reality. It allows for more responsive multimedia systems to generate content with unparalleled realism and diversity, further boosting the advancement of multimedia.
Supplementary Material: zip
Submission Number: 1827
Loading