Improved Techniques for Training Smaller and Faster Stable Diffusion

Hesong Wang; Huan Wang

Improved Techniques for Training Smaller and Faster Stable Diffusion

Hesong Wang, Huan Wang

Published: 06 Mar 2025, Last Modified: 14 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 4 pages)

Keywords: Stable Diffusion, Network Pruning, Step Distillation

Abstract: Recent SoTA text-to-image diffusion models achieve impressive generation quality but their computational cost has been prohibitively large. Network pruning and step distillation are two widely-used compression techniques to reduce the model size and inference steps. This work presents a few improved techniques in these aspects to train smaller and faster diffusion models with a cheap training cost. Specifically, compared to the prior SoTA counterparts, we introduce a structured pruning method to remove insignificant weight blocks based an improved performance sensitivity. To regain performance after pruning, a CFG-aware retraining loss is proposed, which is shown critical to performance. Finally, a modified CFG-aware step distillation is used to reduce the steps. Empirically, our method manages to prune the U-Net parameters of SD v2.1 base by 46\%, inference steps reduced from 25 to 8, achieving an overall $3.0\times$ wall-clock inference speedup. Our 8-step model is significantly better than 25-step BK-SDM, the prior SoTA for cheap Stable Diffusion, while being even smaller.

Submission Number: 119

Loading