Keywords: Model stealing, deep neural networks, adversarial attacks
TL;DR: Input only noise , glean the softmax outputs, steal the weights
Abstract: This paper explores the scenarios under which
an attacker can claim that ‘Noise and access to
the softmax layer of the model is all you need’
to steal the weights of a convolutional neural network
whose architecture is already known. We
were able to achieve 96% test accuracy using
the stolen MNIST model and 82% accuracy using
stolen KMNIST model learned using only
i.i.d. Bernoulli noise inputs. We posit that this
theft-susceptibility of the weights is indicative
of the complexity of the dataset and propose a
new metric that captures the same. The goal of
this dissemination is to not just showcase how far
knowing the architecture can take you in terms of
model stealing, but to also draw attention to this
rather idiosyncratic weight learnability aspects of
CNNs spurred by i.i.d. noise input. We also disseminate
some initial results obtained with using
the Ising probability distribution in lieu of the i.i.d.
Bernoulli distribution
1 Reply
Loading