Keywords: Diffusion, Scoring Rule, Distillation, VSD
Abstract: **Diffusion models** achieve remarkable generative performance but are hampered by slow, iterative inference. Model distillation seeks to train a fast student generator. **Variational Score Distillation (VSD)** offers a principled KL-divergence minimization framework for this task. This method cleverly avoids computing the teacher model's Jacobian, but its student gradient relies on the score of the student's own noisy marginal distribution, $\nabla\_{\mathbf{x}\_t} \log p\_{\phi,t}(\mathbf{x}\_t)$. VSD thus requires approximations, such as training an auxiliary network to estimate this score. These approximations can introduce biases, cause training instability, or lead to an incomplete match of the target distribution, potentially focusing on conditional means rather than broader distributional features.
We introduce **VarFlow**, a method based on a **Score-Rule Variational Distillation (SRVD)** framework. VarFlow trains a one-step generator $g_{\phi}(\mathbf{z})$ by directly minimizing an energy distance (derived from the strictly proper energy score) between the student's induced noisy data distribution $p_{\phi,t}(\mathbf{x}_t)$ and the teacher's target noisy distribution $q_t(\mathbf{x}_t)$. This objective is estimated entirely using samples from these two distributions. Crucially, VarFlow bypasses the need to compute or approximate the intractable student score. By directly matching the full noisy marginal distributions, VarFlow aims for a more comprehensive and robust alignment between student and teacher, offering an efficient and theoretically grounded path to high-fidelity one-step generation.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 17107
Loading