Keywords: adversarial learning, reverse engineering, deep learning, neural network
Abstract: Deep neural networks have achieved remarkable performance in many areas, including image-related classification tasks. However, various studies have shown that they are vulnerable to adversarial examples – images that are carefully crafted to fool well-trained deep neural networks by introducing imperceptible perturbations to the original images. To better understand the inherent characteristics of adversarial attacks, we study the features of three common attack families: gradient-based, score-based, and decision-based. In this paper, we demonstrate that given adversarial examples, attacks from different families can be successfully identified with a simple model. To investigate the reason behind it, we further study the perturbation patterns of different attacks with carefully designed experiments. Experimental results on CIFAR10 and Tiny ImageNet confirm the differences of attacks in distortion patterns.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
1 Reply
Loading