Keywords: Data reconstruction attack, informed adversary, LIME, ReLU
Abstract: Recent work has shown that an adversary can reconstruct training examples given access to the parameters of a deep learning image classification model. We show that the quality of reconstruction depends heavily on the type of activation functions used. In particular, we show that ReLU activations lead to much lower quality reconstructions compared to smooth activation functions. We explore if this phenomenon is a fundamental property of models with ReLU activations, or if it is a weakness of current attack strategies. We first study the training dynamics of small MLPs with ReLU activations and identify redundant model parameters that do not memorise training examples. Building on this, we propose our Mnemonist method, which is able to detect redundant model parameters, and then guide current attacks to focus on informative parameters to improve the quality of reconstructions of training examples from ReLU models.
Other Supplementary Material: zip