- Keywords: Adversarial Examples, Generative Models
- TL;DR: We show analytically and empirically that the Bayes-optimal classifiers are, in some settings, vulnerable to adversarial examples. We then show that even when the optimal classifier is robust, trained CNNs are vulnerable.
- Abstract: Adversarial attacks on CNN classifiers can make an imperceptible change to an input image and alter the classification result. The source of these failures is still poorly understood, and many explanations invoke the "unreasonably linear extrapolation" used by CNNs along with the geometry of high dimensions. In this paper we show that similar attacks can be used against the Bayes-Optimal classifier for certain class distributions, while for others the optimal classifier is robust to such attacks. We present analytical results showing conditions on the data distribution under which all points can be made arbitrarily close to the optimal decision boundary and show that this can happen even when the classes are easy to separate, when the ideal classifier has a smooth decision surface and when the data lies in low dimensions. We introduce new datasets of realistic images of faces and digits where the Bayes-Optimal classifier can be calculated efficiently and show that for some of these datasets the optimal classifier is robust and for others it is vulnerable to adversarial examples. In systematic experiments with many such datasets, we find that standard CNN training consistently finds a vulnerable classifier even when the optimal classifier is robust while large-margin methods often find a robust classifier with the exact same training data. Our results suggest that adversarial vulnerability is not an unavoidable consequence of machine learning in high dimensions, and may often be a result of suboptimal training methods used in current practice.