Abstract: We propose the Compositional Conditional Consistency Model (CCCM), a distilled version of CCDM that enables image generation in just 2–4 steps. CCCM retains the compositional zero-shot generation capability of its teacher while improving inference efficiency via consistency distillation. We further propose a modified consistency distillation framework that explores fusion strategies blending teacher-predicted and diffusion-formulated supervision signals. Experiments on CelebA show that CCCM achieves better FID than CCDM, and fusion improves unseen accuracy, although the underlying causes require further investigation.
Loading