Classifier-to-Generator Attack: Estimation of Training Data Distribution from Classifier


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Suppose a deep classification model is trained with samples that need to be kept private for privacy or confidentiality reasons. In this setting, can an adversary obtain the private samples if the classification model is given to the adversary? We call this reverse engineering against the classification model the Classifier-to-Generator (C2G) Attack. This situation arises when the classification model is embedded into mobile devices for offline prediction (e.g., object recognition for the automatic driving car and face recognition for mobile phone authentication). For C2G attack, we introduce a novel GAN, PreImageGAN. In PreImageGAN, the generator is designed to estimate the the sample distribution conditioned by the preimage of classification model $f$, $P(X|f(X)=y)$, where $X$ is the random variable on the sample space and $y$ is the probability vector representing the target label arbitrary specified by the adversary. In experiments, we demonstrate PreImageGAN works successfully with hand-written character recognition and face recognition. In character recognition, we show that, given a recognition model of hand-written digits, PreImageGAN allows the adversary to extract numeric images without knowing that the model is built for numeric images. In face recognition, we show that, when an adversary obtains a face recognition model for a set of individuals, PreImageGAN allows the adversary to extract face images of specific individuals contained in the set, even when the adversary has no knowledge of the face of the individuals.
  • TL;DR: Estimation of training data distribution from trained classifier using GAN.
  • Keywords: Security, Privacy, Model Publication, Generative Adversarial Networks