Phased Consistency Models

Fu-Yun Wang; Zhaoyang Huang; Alexander William Bergman; Dazhong Shen; Peng Gao; Michael Lingelbach; Keqiang Sun; Weikang Bian; Guanglu Song; Yu Liu; Xiaogang Wang; Hongsheng Li

Phased Consistency Models

Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Xiaogang Wang, Hongsheng Li

Published: 25 Sept 2024, Last Modified: 16 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Consistency Models, Diffusion Models, Distillation

TL;DR: Consistency model analysis and training design for advanced performance on text-to-image generation and text-to-video generation

Abstract: Consistency Models (CMs) have made significant progress in accelerating the generation of diffusion models. However, their application to high-resolution, text-conditioned image generation in the latent space remains unsatisfactory. In this paper, we identify three key flaws in the current design of Latent Consistency Models~(LCMs). We investigate the reasons behind these limitations and propose Phased Consistency Models (PCMs), which generalize the design space and address the identified limitations. Our evaluations demonstrate that PCMs outperform LCMs across 1--16 step generation settings. While PCMs are specifically designed for multi-step refinement, they achieve comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show the methodology of PCMs is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. Our code is available at https://github.com/G-U-N/Phased-Consistency-Model.

Primary Area: Generative models

Submission Number: 2513

Loading