Black-Box Privacy Attacks Against GANs via Detector Networks

Lukman Olagoke; Salil Vadhan; Seth Neel

Black-Box Privacy Attacks Against GANs via Detector Networks

Lukman Olagoke, Salil Vadhan, Seth Neel

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: privacy attacks, generative models, generative adversarial networks, membership inference

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We design black-box privacy attacks against a range of GANs by leveraging detector networks trained to recognize GAN-generated sample.

Abstract: Since their inception Generative Adversarial Networks (GANs) have been popular generative models for various data types, including images, audio, video, and tabular data. One promising application of generative models like GANs is to share restricted or sensitive data with third parties through the creation of synthetic data or the model itself, rather than sharing the underlying data. However, recent research on diffusion models has highlighted privacy vulnerabilities in this approach -- namely that the models memorize significant quantities of the training data, and that existing membership inference attacks can identify generated samples as training points. This paper investigates the privacy implications of using GANs in black-box settings, where adversaries only have access to samples from the generator, rather than access to the discriminator as is often assumed in prior work. We introduce a suite of membership inference attacks against GANs in the black-box setting and evaluate our attacks on image GANs trained on the CIFAR10 dataset and tabular GANs trained on genomic data. Our most successful attack, called The Distinguisher, involve training a second network to score samples based on their likelihood of being generated by the GAN as opposed to a sample from the distribution. A key insight is that a network capable of distinguishing GAN-generated samples from true distribution samples can also distinguish training samples from the distribution. Our main findings indicate that across various GAN architectures and data types, adversaries can orchestrate non-trivial privacy attacks when provided with access to samples from the generator. However, the observed privacy leakage in GANs appears to be lower compared to other generative and discriminative models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6837

Loading