everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
To alleviate the utility degradation of deep learning classification with differential privacy (DP), employing extra public data or pre-trained models has been widely explored. Recently, the use of in-distribution public data has been investigated, where a tiny subset of data owners share their data publicly. In this paper, to mitigate memorization and overfitting by the limited-sized in-distribution public data, we leverage recent diffusion models and employ various augmentation techniques for improving diversity. We then explore the optimization to discover flat minima to public data and suggest weight multiplicity to enhance the generalization of the private training. While assuming 4% of training data as public, our method brings significant performance gain even without using pre-trained models, i.e., achieving 85.78% on CIFAR-10 with a privacy budget of $\varepsilon=2$ and $\delta=10^{-5}$.