Keywords: adversarial attack characterization, local linearity, adversarial response characteristics, sequel attack effect
Abstract: Adversarial attacks pose safety and security concerns to deep learning
applications, but their characteristics are under-explored. Yet largely
imperceptible, a strong trace could have been left by PGD-like attacks in an
adversarial example. Recall that PGD-like attacks trigger the ``local
linearity'' of a network, which implies different extents of linearity for
benign or adversarial examples. Inspired by this, we construct an Adversarial
Response Characteristics (ARC) feature to reflect the model's gradient
consistency around the input to indicate the extent of linearity. Under
certain conditions, it qualitatively shows a gradually varying pattern from
benign example to adversarial example, as the latter leads to Sequel Attack
Effect (SAE). To quantitatively evaluate the effectiveness of ARC, we conduct
experiments on CIFAR-10 and ImageNet for attack detection and attack type
recognition in a challenging setting. The results suggest that SAE is an
effective and unique trace of PGD-like attacks reflected through the ARC
feature. The ARC feature is intuitive, light-weighted, non-intrusive, and
data-undemanding.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
TL;DR: We present ARC features where SAE is a unique trace left by PGD-like attacks.
Supplementary Material: zip
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
5 Replies
Loading