Accelerate Quantization Aware Training for Diffusion Models with Difficulty-aware Time Allocation

Siao Tang; Xin Wang; Yuan Meng; Hong Chen; Chaoyu Guan; Yansong Tang; Wenwu Zhu

Accelerate Quantization Aware Training for Diffusion Models with Difficulty-aware Time Allocation

Siao Tang, Xin Wang, Yuan Meng, Hong Chen, Chaoyu Guan, Yansong Tang, Wenwu Zhu

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Model Quantization, Text-to-image Generation

TL;DR: This paper proposes a framework dubbed DFastQ to accelerate the Quantization-aware Training for diffusion models from a difficulty-aware perspective in the timestep dimension

Abstract: Diffusion models have demonstrated remarkable power in various generation tasks. Nevertheless, the large computational cost during inference is a troublesome issue for diffusion models, especially for large pretrained models such as Stable Diffusion. Quantization-aware training (QAT) is an effective method to reduce both memory and time costs for diffusion models while maintaining good performance. However, QAT methods usually suffer from the high cost of retraining the large pretrained model, which restricts the efficient deployment of diffusion models. To alleviate this problem, we propose a framework DFastQ (Diffusion Fast QAT) to accelerate the training of QAT from a difficulty-aware perspective in the timestep dimension. Specifically, we first propose to adaptively identify the difficulties of different timesteps according to the oscillation of their training loss curves. Then we propose a difficulty-aware time allocation module, which aims to dynamically allocate more training time to difficult timesteps to speed up the convergence of QAT. The key component of this is a timestep drop mechanism consisting of a drop probability predictor and a pair of adversarial losses. We conduct a series of experiments on different Stable Diffusion models, quantization settings, and sampling strategies, demonstrating that our method can effectively accelerate QAT by at least 24\% while achieving comparable or even better performance.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6906

Loading