CoDM: A Co-design Framework for Efficient Sparse Diffusion Models

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: GPU, Sparse Tensor Core, Diffusion Model
Abstract: Diffusion models have emerged as a powerful class of generative models that excel at capturing complex data distributions and producing realis- tic, high-fidelity samples. However, these benefits come at the cost of expensive computation and memory requirements due to their iterative denois- ing process. The cost is especially significant for high-resolution images, videos, 3D data, or long sequences. In this paper, we propose CoDM, a co-design framework that seamlessly integrates model compression techniques with the sparse tensor cores of NVIDIA Hopper H100 GPUs. By leveraging specialized hardware capabilities and jointly optimizing the model compression scheme and storage format, CoDM achieves sig- nificant model speedup while maintaining data generation quality. Specifically, our approach enhances diffusion models through several key strategies, namely reducing inference steps and model weights through a novel hierarchical prun- ing scheme, improving memory efficiency via a new sparse storage format, and leveraging Ten- sorRT optimization and the specialized cores of GPU hardware accelerators. This codesign ap- proach addresses the computational challenges of diffusion models, making them more acces- sible for real-world applications. Experimental results in a Text-to-Image application demonstrate that our approach surpasses the state-of-the-art, achieving a 7.4-fold speedup on the ImageNet (256×256) dataset and an 11.5-fold speedup on the CIFAR-10(32×32) dataset, all while preserv- ing the quality of the generated images with a similar or lower Fr echet Inception Distance (FID) score.
Submission Number: 38
Loading