Abstract: Machine learning models including traditional models and neural networks can be easily fooled by adversarial examples which are generated from the natural examples with small perturbations. This poses a critical challenge to machine learning security, and impedes the wide application of machine learning in many important domains such as computer vision and malware detection. Unfortunately, even state-of-the-art defense approaches such as adversarial training and defensive distillation still suffer from major limitations and can be circumvented. From a unique angle, we propose to investigate two important research questions in this paper: Are adversarial examples distinguishable from natural examples? Are adversarial examples generated by different methods distinguishable from each other? These two questions concern the distinguishability of adversarial examples. Answering them will potentially lead to a simple yet effective approach, termed as defensive distinction in this paper under the formulation of multi-label classification, for protecting against adversarial examples. We design and perform experiments using the MNIST dataset to investigate these two questions, and obtain highly positive results demonstrating the strong distinguishability of adversarial examples. We recommend that this unique defensive distinction approach should be seriously considered to complement other defense approaches.
Keywords: Adversarial Examples, Machine Learning, Neural Networks, Distinguishability, Defense
TL;DR: We propose a defensive distinction protection approach and demonstrate the strong distinguishability of adversarial examples.
Data: [MNIST](https://paperswithcode.com/dataset/mnist)
7 Replies
Loading