Keywords: Differential Privacy, Dataset Distillation
TL;DR: We show that differentially private dataset distillation can outperform with DP-SGD for private image classification
Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD), which iteratively perturbs clipped per-sample gradients and tracks the cumulative privacy risk using composition accounting, has become a cornerstone in private deep learning. Despite its versatility, DP-SGD in practice faces several limitations. It is constrained by the number of gradient iterations permissible under a limited privacy budget, and is restricted by incompatibilities with common deep learning techniques like ensembling and BatchNorm, and typically produces only a single trained model. In this work, we propose an algorithm for generating a differentially-private (DP) synthetic version of a sensitive dataset. This allows the synthetic dataset to be distributed and postprocessed freely without additional privacy loss, giving more flexibility than DP-SGD. Building on dataset distillation—by producing compact synthetic datasets that preserve downstream performance— we introduce SPS (Summarize–Privatize–Synthesize) and its enhanced variant SPS+. In contrast to prior works, SPS is, to our knowledge, the first alternative to DP-SGD that attains higher accuracy on image-classification tasks. Concretely, on CIFAR 10 / CIFAR 100 with privacy budget $\epsilon=1$, SPS+ achieves 96.2/76.6% top-1 accuracy, outperforming state-of-the-art (SOTA) DP-SGD results 94.8/70.3%.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22721
Loading