De-biasing Diffusion: Data-Free FP8 Quantization of Text-to-Image Models with Billions of Parameters
Keywords: Diffusion, Quantization, Floating-Point, Data-Free, Text-To-Image
TL;DR: We propose a simple, data-free method to improve diffusion models with FP8 quantization, with significant improvement for all open source text-to-image models.
Abstract: Diffusion neural networks have become the go-to solution for tasks involving automatic image generation, but the generation process is expensive in terms of memory, energy, and computational cost. Several works have aimed to reduce the cost via quantization to 8 bits or less. Despite that, half-precision (FP16) is still the default mode for large text-to-image generation models --- where reducing numerical precision becomes more challenging. In this work, we show that the reduction of quantization bias can be more important than the reduction in quantization (mean square) error. We propose a data-free method for model quantization, with the goal of producing images that are indistinguishable from those generated by large, full-precision diffusion models. We show that simple methods like stochastic rounding can decrease the quantization bias and improve image generation quality with little to no cost. To close the remaining gap between full-precision and quantized models, we suggest a feasible method for partial stochastic-rounding of weights. When using the MS-COCO dataset as the baseline, we show our quantization methods achieve as good FID scores as the full-precision model. Moreover, our methods decrease the quantization-induced distortion of the images generated by the full-precision model, with the distortion decreasing with the number of diffusion steps.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3834
Loading