Keywords: dataset distillation, diffusion model
Abstract: Dataset distillation has witnessed significant progress in synthesizing small-scale datasets that encapsulate rich information from large-scale original ones. Particularly, methods based on generative priors show promising performance, while maintaining computational efficiency and cross-architecture generalization. However, the generation process lacks explicit controllability for each sample. Previous distillation methods primarily match the real distribution from the perspective of the entire dataset, whereas overlooking conceptual completeness at the instance level. This oversight can result in missing or incorrectly represented object details and compromised dataset quality. To this end, we propose to incorporate the conceptual understanding of large language models (LLMs) to perform a CONCept-infORmed Diffusion process for dataset distillation, in short as CONCORD. Specifically, distinguishable and fine-grained concepts are retrieved based on category labels to explicitly inform the denoising process and refine essential object details. By integrating these concepts, the proposed method significantly enhances both the controllability and interpretability of the distilled image generation, without replying on pre-trained classifiers. We demonstrate the efficacy of CONCORD by achieving state-of-the-art performance on ImageNet-1K and its subsets. It further advances the practical application of dataset distillation methods. The code implementation is attached in the supplementary material.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1425
Loading