Abstract: Stable Diffusion has shown strong ability to generate high-quality and diverse images. However, Stable Diffusion suffers from high computational cost, due to the heavy model and step-by-step denoising process. To address these issues, we propose a token-aware and step-aware acceleration approach for Stable Diffusion, named TSA-SD. We first build a simple and efficient baseline by combining exiting intra-step and cross-step acceleration strategies, including token merging and feature caching, into Stable Diffusion. To improve image generation quality of the baseline, we introduce token-aware merging–unmerging and step-aware acceleration. The token-aware merging–unmerging aims to select informative tokens when merging and recover merged tokens using token ratio information. Therefore, the token-aware merging–unmerging can fully utilize token-specific information, thereby reducing token information loss. In addition, we observe that different steps have different functional linearity, and propose step-aware acceleration to perform different merging operations according to functional linearity at different steps. With these two modules, our proposed TSA-SD is able to generate high-quality images at a high speed. We perform the experiments on two widely-used datasets, including ImageNet and MS-COCO. The experimental results demonstrate the effectiveness and efficiency of our proposed method. For instance, on ImageNet validation set, compared to Stable Diffusion, ToMe-SD has a lower FID of 33.68 at 1.96×<math><mo is="true">×</mo></math> speedup, while our method achieves a lower FID of 32.49 at 4.68×<math><mo is="true">×</mo></math> speedup.
Loading