Safeguarding Visual Privacy in Dataset Distillation: Robust Initialization via Augmentation

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dataset Distillation
Abstract: Dataset distillation synthesizes small datasets that enable models to achieve accuracy comparable to training on the original full data, yielding substantial training efficiency gains. In addition, distilled data have been used for privacy-preserving applications, especially to mitigate membership inference attacks (MIA), where adversaries query a model to decide whether a sample was in its training set. However, we are the first to show that state-of-the-art dataset distillation leaks visual privacy. Distilled images can be visually consistent with private originals, as measured by LPIPS, thereby leaking sensitive information. We theoretically trace this risk to the common practice of initializing distilled images with original samples. To counter this, we propose Kaleidoscopic Transformation (KT), a plug-and-play module that applies aggregated, strong yet semantics-preserving perturbations to selected original images at initialization. Extensive experiments demonstrate that KT consistently strengthens resistance to MIA and improves visual privacy, while maintaining competitive downstream accuracy. Our code will be publicly available.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3979
Loading