Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: In this paper, we present a new dual-sampling framework, called Morse, to accelerate diffusion models losslessly.
Abstract: In this paper, we present $Morse$, a simple dual-sampling framework for accelerating diffusion models losslessly. The key insight of Morse is to reformulate the iterative generation (from noise to data) process via taking advantage of fast jump sampling and adaptive residual feedback strategies. Specifically, Morse involves two models called $Dash$ and $Dot$ that interact with each other. The Dash model is just the pre-trained diffusion model of any type, but operates in a jump sampling regime, creating sufficient space for sampling efficiency improvement. The Dot model is significantly faster than the Dash model, which is learnt to generate residual feedback conditioned on the observations at the current jump sampling point on the trajectory of the Dash model, lifting the noise estimate to easily match the next-step estimate of the Dash model without jump sampling. By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance while improving overall runtime efficiency. With our proposed weight sharing strategy between the Dash and Dot models, Morse is efficient for training and inference. Our method shows a lossless speedup of 1.78$\times$ to 3.31$\times$ on average over a wide range of sampling step budgets relative to 9 baseline diffusion models on 6 image generation tasks. Furthermore, we show that our method can be also generalized to improve the Latent Consistency Model (LCM-SDXL, which is already accelerated with consistency distillation technique) tailored for few-step text-to-image synthesis. The code and models are available at https://github.com/deep-optimization/Morse.
Lay Summary: A diffusion model can generate high quality images by converting noise to image iteratively. The full generation process consists of $T$ sampling steps maximally, where $T$ is decided according to the sampling method. The generation process is mostly very time-consuming. Therefore, most diffusion methods adopt jump sampling for acceleration. By using the number of steps $t$ (mostly much lower than $T$), namely jumping over $T-t$ sampling steps, we can achieve faster generation but lower image quality. We want to compensate the quality degradation, in order to achieve a better tradeoff between quality and latency. In this paper, we present a simple framework $Morse$. For a pre-trained diffusion model, Morse can be used to accelerate it. Given a desired image quality, Morse helps a diffusion model achieve the quality using lower latency. For a diffusion process which jumps over an amount of steps, Morse adds several extra steps between each pair of adjacent steps. In the extra steps, we don't use the pre-trained diffusion model but introduce an extra faster model. We name the pre-trained diffusion model as Dash and the extra model as Dot. For a sampling step, Dot is multiple times faster than Dash in latency. While Dash only takes the information about current state during the generation process from noise to image, we additionally provide the trajectory information about the previous state to Dot. Therefore, Dot can perform as well as Dash. With our proposed strategy, the Dot model can be trained efficiently. Morse is efficient for training and inference. Our method shows a lossless speedup of 1.78$\times$ to 3.31$\times$ on average over different numbers of sampling steps to 9 baseline diffusion models on 6 image generation tasks.
Link To Code: https://github.com/deep-optimization/Morse
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion models, image generation, text-to-image generation, model acceleration
Submission Number: 8722
Loading