ADD-Defense: Towards Defending Widespread Adversarial Examples via Perturbation-Invariant RepresentationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: defense framework, widespread adversarial examples, perturbation-invariant representation, adversarial learning
Abstract: Due to vulnerability of machine learning algorithms under adversarial examples, it is challenging to defend against them. Recently, various defenses have been proposed to mitigate negative effects of adversarial examples generated from known attacks. However, these methods have obvious limitations against unknown attacks. Cognitive science investigates that the brain can recognize the same person with any expression by extracting invariant information on the face. Similarly, different adversarial examples share the invariant information retained from original examples. Motivated by this observation, we propose a defense framework ADD-Defense, which extracts the invariant information called \textit{perturbation-invariant representation} (PIR) to defend against widespread adversarial examples. Specifically, realized by adversarial training with additional ability to utilize perturbation-specific information, the PIR is invariant to known attacks and has no perturbation-specific information. Facing the imbalance between widespread unknown attacks and limited known attacks, the PIR is expected to generalize well on unknown attacks via being matched to a Gaussian prior distribution. In this way, the PIR is invariant to both known and unknown attacks. Once the PIR is learned, we can generate an example without malicious perturbations as the output. We evaluate our ADD-Defense using various pixel-constrained and spatially-constrained attacks, especially BPDA and AutoAttack. The empirical results illustrate that our ADD-Defense is robust to widespread adversarial examples.
One-sentence Summary: To defend against widespread adversarial examples, we propose a defense ADD-Defense to extract invariant information of adversarial examples called perturbation-invariant representation.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=BwEzfHzAAO
6 Replies

Loading