Track: tiny / short paper (up to 4 pages)
Keywords: Stable Diffusion, Network Pruning, Step Distillation
Abstract: Recent SoTA text-to-image diffusion models achieve impressive generation quality but their computational cost has been prohibitively large. Network pruning and step distillation are two widely-used compression techniques to reduce the model size and inference steps. This work presents a few improved techniques in these aspects to train smaller and faster diffusion models with a cheap training cost. Specifically, compared to the prior SoTA counterparts, we introduce a structured pruning method to remove insignificant weight blocks based an improved performance sensitivity. To regain performance after pruning, a CFG-aware retraining loss is proposed, which is shown critical to performance. Finally, a modified CFG-aware step distillation is used to reduce the steps. Empirically, our method manages to prune the U-Net parameters of SD v2.1 base by 46\%, inference steps reduced from 25 to 8, achieving an overall $3.0\times$ wall-clock inference speedup. Our 8-step model is significantly better than 25-step BK-SDM, the prior SoTA for cheap Stable Diffusion, while being even smaller.
Submission Number: 119
Loading