Deflecting Adversarial AttacksDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone
Abstract: There has been an ongoing cycle where stronger detection mechanisms and defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach, which we argue is a step towards ending this cycle by deflecting adversarial attacks, i.e., by forcing the attacker to produce an input which semantically resembles the attack's target class. To this end, we first propose a stronger defense mechanism based on capsule networks which combines three detection mechanisms to achieve state-of-the-art detection performance on both standard and defense-aware attacks. We then show that undetected attacks against our defense are often classified as the adversarial target class by performing a human study where participants are asked to label the class of images produced by the attack. These attack images thus can no longer be called adversarial, as our network classifies them the same way as humans do.
Keywords: Adversarial Examples
Original Pdf: pdf
6 Replies

Loading