Multi-student Diffusion Distillation for Better One-step Generators

Yanke Song; Jonathan Lorraine; Weili Nie; Karsten Kreis; James Lucas

Multi-student Diffusion Distillation for Better One-step Generators

Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Mixture of Experts, Efficient Foundation Model

TL;DR: We enhance diffusion models by distilling to multiple students, for (a) lifted quality by data subset specialization and (b) enhanced latency by distilling to smaller models, allowing 1-step generation, now with smaller, lower-latency architectures.

Abstract: Foundation models that generate photorealistic video with text or image guidance promise compelling augmented‑reality (AR) experiences, yet their prohibitive test‑time compute prevents true real‑time deployment. We focus on the dominant diffusion family and show that Multi-Student Distillation (MSD) increases effective model capacity without increasing — or even reducing — latency, memory footprint, or energy per sample. MSD partitions the conditioning space and trains a lightweight one‑step generator per partition, allowing (i) higher sample quality at fixed latency and (ii) smaller per‑student backbones that for edge/low-latency budgets.

Submission Number: 11

Loading