PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors

Published: 13 Jan 2026, Last Modified: 13 Jan 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bias drives generation toward overly smooth, homogeneous samples, reducing intra-class diversity and limiting generalization. We present PRISM (PRIors from diverse Source Models), a framework that disentangles architectural priors during synthesis. PRISM decouples the logit-matching and regularization objectives, supervising them with different teacher architectures: a primary model for logits and a stochastic subset for batch-normalization (BN) alignment. On ImageNet-1K, PRISM consistently and reproducibly outperforms single-teacher methods (e.g., SRe2L) and recent multi-teacher variants (e.g., G-VBSM) at low- and mid-IPC regimes. The generated data also show significantly richer intra-class diversity, as reflected by a notable drop in cosine similarity between features. We further analyze teacher selection strategies (pre- vs. intra-distillation) and introduce a scalable cross-class batch formation scheme for fast parallel synthesis. Code: https://github.com/Brian-Moser/prism
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have added three new experiments and corresponding sections: ***(i)*** CIFAR-100 (with hyper-parameters in Appendix B) for Section 4.2, ***(ii)*** Section 4.6 Fixed vs. variable BN teacher sets with Table 5, and ***(iii)*** PRISM with transformer backbone in Section 4.7 with Table 6.
Code: https://github.com/Brian-Moser/prism
Assigned Action Editor: ~Jaesik_Park3
Submission Number: 6487
Loading