T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: diffusion model, model stitching, efficient sampling
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose Trajectory Stitching, a simple but effective technique that leverages small pretrained diffusion models to accelerate sampling in large pretrained diffusion models without training.
Abstract: Diffusion probabilistic models (DPMs) achieve great success in generating high-quality data such as images and videos. However, sampling from DPMs at inference time is often expensive for high-quality generation and typically requires hundreds of steps with a large network model. In this paper, we introduce sampling Trajectory Stitching (T-Stitch), a simple yet efficient technique to improve the generation efficiency with little or no loss in the generation quality. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. The key reason why T-Stitch works is that different diffusion models learn similar encodings under the same training data distribution. While smaller models are not as effective in refining high-frequency details in later denoising steps, they are still capable of generating good global structures in the early steps. Thus, smaller models can be used in early steps to reduce the computational cost. Notably, T-Stitch does not need any further training and uses only pretrained models. Thus, it can be easily combined with other fast sampling techniques to obtain further efficiency gains across different architectures and samplers. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. By allocating different fractions of small and large DPMs along the sampling trajectory, we can achieve flexible speed and quality trade-offs. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 901
Loading