Quantifying Adversarial Sensitivity of a Model as a Function of the Image Distribution

Anonymous

Quantifying Adversarial Sensitivity of a Model as a Function of the Image Distribution

Anonymous

09 Oct 2020 (modified: 05 May 2023)Submitted to SVRHM@NeurIPSReaders: Everyone

Keywords: Adversarial Robustness, Image Statistics, Explainable Machine Learning, Empirical Analysis

TL;DR: We empirically verify a set of results pertaining to the role of image statistics in the adversarial sensitivity of a model through an adaptation of an existing performance metric.

Abstract: In this paper, we propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model over a particular $\epsilon$-interval $[\epsilon_0, \epsilon_1]$ (interval of adversarial perturbation strengths) that facilitates comparisons across models when they have different initial $\epsilon_0$ performance. This can be used to determine how adversarially sensitive a model is to different image distributions; and/or to measure how robust a model is comparatively to other models for the same distribution. We used this adversarial robustness metric on MNIST, CIFAR-10, and a Fusion dataset (CIFAR-10 + MNIST) where trained models performed either a digit or object recognition task using a LeNet, ResNet50, or a fully connected network (FullyConnectedNet) architecture and found the following: 1) CIFAR-10 models are more adversarially sensitive than MNIST models; 2) Pretraining with another image distribution \textit{sometimes} carries over the adversarial sensitivity induced from the image distribution -- contingent on the pretrained image manifold; 3) Increasing the complexity of the image manifold increases the adversarial sensitivity of a model trained on that image manifold, but also shows that the task plays a role on the sensitivity. Collectively, our results imply non-trivial differences of the learned representation space of one perceptual system over another given its exposure to different image statistics (mainly objects vs digits). Moreover, these results hold even when model systems are equalized to have the same level of performance, or when exposed to matched image statistics of fusion images but with different tasks.

1 Reply

Loading