Vulnerable Region Discovery through Diverse Adversarial Examples

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Explainable DNNs, Adversarial Examples, Vulnerable Regions
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Deep Neural Networks (DNNs) have shown great promise in multiple fields, but ensuring their reliability remains a challenge. Current explainable approaches for DNNs mainly aim at understanding DNNs’ behavior by identifying and prioritizing the influential input features that contribute to the model’s predictions, often overlooking \textit{vulnerable regions} that are highly sensitive to small perturbations. Traditional norm-based adversarial example generation algorithms, due to their lack of spatial constraints, often distribute adversarial perturbations throughout images, making it hard to identify these specific vulnerable regions. To address this oversight, we introduce an innovative method that uncovers these vulnerable regions by employing adversarial perturbations at diverse locations. Specifically, our method operates within a one-pixel paradigm. This enables detailed pixel-level vulnerability assessments by evaluating the effects of individual perturbations on predictions. By leveraging the robust Sharing Differential Evolution Algorithm, we can simultaneously identify multiple one-pixel perturbations, forming a vulnerable region. We conduct thorough experiments across a variety of network architectures and adversarial training techniques, showing that our approach not only effectively identifies vulnerable regions but also provides invaluable insights into the inherent vulnerabilities present in a diverse range of deep learning models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3295
Loading