Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion models, Distillation, Noise-data Pairings, Memorization
TL;DR: This paper investigates why single-step Distribution Matching Distillation (DMD) students spontaneously reproduce the teacher's exact noise-to-data mappings, despite trained to only match the final data distribution.
Abstract: Distribution Matching Distillation (DMD) aligns noised distributions across scales to compress diffusion models. While Distribution Matching Distillation (DMD) is theoretically pairing-agnostic, we identify an emergent copying phenomenon: high-dimensional students spontaneously reproduce the teacher’s original noise–data pairings. This behavior, absent in low-dimensional settings, is not an artifact of auxiliary losses or memorization. Instead, we argue that copying arises from the constrained geometric freedom of the student model during high-dimensional distillation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 27
Loading