DIVER: Diving Deeper into Distilled Data via Expressive Semantic Recovery

Qianxin Xia; Zhiyong Shu; Wenbo Jiang; Jiawei Du; Jielei Wang; Guoming Lu

DIVER: Diving Deeper into Distilled Data via Expressive Semantic Recovery

Qianxin Xia, Zhiyong Shu, Wenbo Jiang, Jiawei Du, Jielei Wang, Guoming Lu

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dataset Distillation, Dataset Condensation, Privacy Protection

TL;DR: we propose the task of "diving into distilled data" for the first time, where we enhance remarkable cross-architecture generalization on synthetic datasets in a raw data-free and training-free manner.

Abstract: Dataset distillation aims to synthesize a compact proxy dataset that is unreadable or non-raw from the original dataset for privacy protection and highly efficient learning. However, previous approaches typically adopt a single-stage distillation paradigm, which suffers from learning specific patterns that overfit on a prior architecture, consequently suppressing the expression of semantics and leading to performance degradation across heterogeneous architectures. To address this issue, we propose a novel dual-stage distillation framework called ${\textbf{DIVER}}$, which leverages the pre-trained diffusion model to dive deeper into $\textbf{DI}$stilled data $\textbf{V}$ia $\textbf{E}$xpressive semantic $\textbf{R}$ecovery, a process of semantic inheritance, guidance, and fusion. Semantic inheritance distills high-level semantic knowledge of abstract distilled images into the latent space to filter out architecture-specific ``noise" and retain the intrinsic semantics. Furthermore, semantic guidance improves the preservation of the original semantics by directing the reverse procedure. Ultimately, semantic fusion is designed to fuse conditional labels with inherited and guided semantics only during the concrete phase of the reverse process, compensating for the lack of category information in the original features. Extensive experiments validate the effectiveness and efficiency of our method in improving classical distillation techniques and significantly improving cross-architecture generalization, requiring processing time comparable to raw DiT on ImageNet (256$\times$256) with only 4.02 GB of GPU memory usage.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 10876

Loading