Keywords: Diffusion models, Consistency Distillation, Memorization
Abstract: Diffusion models play a central role in modern generative modeling, and understanding how they balance memorization and generalization is critical for their reliability and practical use. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and memorization emerging at different stages of training. However, in practice, pretrained diffusion models are often distilled—an extra training phase whose impact on memorization is not well understood. In this work, we analyze how distillation reshapes memorization behavior in diffusion models, and take the prevalent consistency distillation as a representative framework. Empirically, we show that when applied to a teacher model that has memorized data, distillation significantly reduces transferred memorization in the student while simultaneously improving overall sample quality. To explain this, we further provide a theoretical analysis using a random feature neural network model (Bonnaire et al., 2025), showing that consistency distillation suppresses unstable feature directions associated with memorization while preserving stable, generalizable modes. Our findings show the potential of distillation beyond acceleration, enhancing generalization and long-term trustworthiness.
Submission Number: 23
Loading