DP-ImgSyn: Dataset Alignment for Obfuscated, Differentially Private Image Synthesis

DP-ImgSyn: Dataset Alignment for Obfuscated, Differentially Private Image Synthesis

TMLR Paper1703 Authors

19 Oct 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The availability of abundant data has catalyzed the expansion of deep learning vision algorithms. However, certain vision datasets depict visually sensitive content such as content moderation images. Sharing or releasing these datasets to the community would improve the performance of neural models, but poses moral and ethical questions. Thus, there is a need to share such datasets with privacy guarantees without sharing visually sensitive data. Traditionally, Generative Adversarial Networks (GANs) with Differential Privacy (DP) guarantees are employed to generate and release data. However, GAN-based approaches result in images that are visually similar to private images. In this paper, we propose a non-generative framework, Differentially Private Image Synthesis (DP-ImgSyn), to sanitize and release visually sensitive data with DP guarantees to address these issues. DP-ImgSyn consists of the following steps. First, a teacher model is trained (for classification) using a DP training algorithm. Second, optimization is performed on a public dataset using the teacher model to align it with the private dataset. We show that this alignment improves performance (up to $\approx$ 17%) and ensures that the generated/aligned images are visually similar to the public images. The optimization uses the teacher network's batch normalization layer statistics (mean, standard deviation) to inject information about the private images into the public images. The synthesized images with their corresponding soft labels obtained by teacher model are released as the sanitized dataset. A student model is trained on the released dataset using KL-divergence loss. The proposed framework circumvents the issues of generative methods and generates images visually similar to the public dataset. Thus, it obfuscates the private dataset using the public dataset. Our experiments on various vision datasets show that when using similar DP training mechanisms, our framework performs better than generative techniques (up to $\approx$ 20%).

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: The requested changes have been highlighted in yellow in the revised version of the manuscript. In the supplementary material, along with the code that we had already provided, we included an additional video for the visualization of the proposed method on a real example.

Assigned Action Editor: ~bo_han2

Submission Number: 1703

Loading