Keywords: diffusion models, quantization
Abstract: Diffusion models for image generation have achieved notable success in various applications. However, these models often require tremendous storage overhead and inference time cost, severely hampering their deployment on resource-constrained devices. Post-training quantization (PTQ) has recently emerged as a promising way to reduce the model size and the inference latency, by converting the float-point values into lower bit-precision. Nevertheless, most existing PTQ approaches neglect the accumulating quantization errors arising from the substantial distribution variations across distinct layers and blocks at different timesteps, thus suffering a significant accuracy degradation. To address these issues, we propose a novel temporal distribution-aware quantization (DAQ) method for diffusion models. DAQ firstly develops a distribution-aware finetuning (DAF) framework to dynamically suppress the accumulating quantization errors in the calibration process. Subsequently, DAQ employs a full-precision noise estimation network to optimize the quantized noise estimation network at each sampling timestep, further aligning the quantizers with varying input distributions. We evaluate the proposed method on the widely used public benchmarks for image generation tasks. The experimental results clearly demonstrate that DAQ reaches the state-of-the-art performance compared to existing works. We also display that DAQ can be applied as a plug-and-play module to existing PTQ models, remarkably boosting the overall performance.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 672
Loading