Keywords: Diffusion models, Distillation, Noise-data Pairings, Memorization
TL;DR: This paper investigates why single-step Distribution Matching Distillation (DMD) students spontaneously reproduce the teacher's exact noise-to-data mappings, despite trained to only match the final data distribution.
Abstract: Distribution Matching Distillation (DMD) aligns noised distributions across scales to compress diffusion models. While Distribution Matching Distillation (DMD) is theoretically pairing-agnostic, we identify an emergent copying phenomenon: high-dimensional students spontaneously reproduce the teacher’s original noise–data pairings. This behavior, absent in low-dimensional settings, is not an artifact of auxiliary losses or memorization. Instead, we argue that copying arises from the constrained geometric freedom of the student model during high-dimensional distillation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 27
Loading