Distribution-Aware Diffusion Model Quantization via Distortion Minimization

Wang Zhe Mark; Fen Fang; Xu Kaixin; Hongyuan Zhu; Ying Sun; Xue Geng; Xulei Yang; Min Wu; Weisi Lin

Distribution-Aware Diffusion Model Quantization via Distortion Minimization

Wang Zhe Mark, Fen Fang, Xu Kaixin, Hongyuan Zhu, Ying Sun, Xue Geng, Xulei Yang, Min Wu, Weisi Lin

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion model, image/video generation, post-training quantization, Taylor series expansion approximation

Abstract: Diffusion models have attained significant performance in image/video generation and related tasks. However, while diffusion models excel in delivering excellent results, they suffer from substantial computational complexity due to their large volume of parameters. This poses a significant issue for deployment on mobile devices and hampers the practical applications of diffusion models. In this work, we propose a new post-training quantization approach designed to reduce the computation complexity and memory cost of diffusion models. As the distributions of the outputs of diffusion models differ significantly across timesteps, our approach first splits the timesteps into different groups and optimizes the quantization configuration of each group separately. We then formulate the quantization of each group as a rate-distortion optimization problem to minimize the output distortion caused by quantization given the model size constraint. Because output distortion is highly related to model accuracy, by minimizing the output distortion, our approach is able to compress diffusion models to low bit widths without hurting accuracy. Furthermore, our approach applies Taylor series expansion approximation and proposes an efficient method to find the optimal bit allocation across layers with linear time complexity. Extensive experimentation over four datasets including CIFAR-10, CelebaHQ, LSUN-Bedroom, and LSUN-Church validates the effectiveness of our approach. Empirical results show that our approach obtains a notable improvement over state-of-the-art and can reduce the bit width of diffusion models to 5-6 bits while maintaining high accuracy levels.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6030

Loading