De-biasing Diffusion: Data-Free FP8 Quantization of Text-to-Image Models with Billions of Parameters

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion, Quantization, Floating-Point, Data-Free, Text-To-Image
TL;DR: We propose a simple, data-free method to improve diffusion models with FP8 quantization, with significant improvement for all open source text-to-image models.
Abstract: Diffusion neural networks have become the go-to solution for tasks involving automatic image generation, but the generation process is expensive in terms of memory, energy, and computational cost. Several works have aimed to reduce the cost via quantization to 8 bits or less. Despite that, half-precision (FP16) is still the default mode for large text-to-image generation models --- where reducing numerical precision becomes more challenging. In this work, we show that the reduction of quantization bias can be more important than the reduction in quantization (mean square) error. We propose a data-free method for model quantization, with the goal of producing images that are indistinguishable from those generated by large, full-precision diffusion models. We show that simple methods like stochastic rounding can decrease the quantization bias and improve image generation quality with little to no cost. To close the remaining gap between full-precision and quantized models, we suggest a feasible method for partial stochastic-rounding of weights. When using the MS-COCO dataset as the baseline, we show our quantization methods achieve as good FID scores as the full-precision model. Moreover, our methods decrease the quantization-induced distortion of the images generated by the full-precision model, with the distortion decreasing with the number of diffusion steps.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3834
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview