A Simple Unsupervised Data Depth-based Method to Detect Adversarial ImagesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Adversarial attacks, Detection, Vision transformers, Safety AI
TL;DR: We crafted a simple detection method for adversarial samples based on data depths which is especially designed for vision transformers architectures
Abstract: Deep neural networks suffer from critical vulnerabilities regarding robustness, which limits their exploitation in many real-world applications. In particular, a serious concern is their inability to defend against adversarial attacks. Although the research community has developed a large amount of effective attacks, the detection problem has received little attention. Existing detection methods either rely on additional training or on specific heuristics at the risk of overfitting. Moreover, they have mainly focused on ResNet architectures while transformers, which are state-of-the-art for vision tasks, have not been properly investigated. In this paper, we overcome these limitations by introducing APPROVED, a simple unsupervised detection method for transformer architectures. It leverages the information available in the logit layer and computes a similarity score with respect to the training distribution. This is accomplished using a data depth that is: (i) computationally efficient; and (ii) non-differentiable, making it harder for gradient-based adversaries to craft malicious samples. Our extensive experiments show that APPROVED consistently outperforms previous detectors on CIFAR10, CIFAR100 and Tiny ImageNet.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
6 Replies

Loading