Target Training: Tricking Adversarial Attacks to FailDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: adversarial machine learning
Abstract: Recent adversarial defense approaches have failed. Untargeted gradient-based attacks cause classifiers to choose any wrong class. Our novel white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes. From these target classes, we derive the real classes. The Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks: minimize the sum of (1) perturbation and (2) classifier adversarial loss. Target Training changes the classifier minimally, and trains it with additional duplicated points (at 0 distance) labeled with designated classes. These differently-labeled duplicated samples minimize both terms (1) and (2) of the minimization, steering attack convergence to samples of designated classes, from which correct classification is derived. Importantly, Target Training eliminates the need to know the attack and the overhead of generating adversarial samples of attacks that minimize perturbations. Without using adversarial samples and against an adaptive attack aware of our defense, Target Training exceeds even default, unsecured classifier accuracy of 84.3% for CIFAR10 with 86.6% against DeepFool attack; and achieves 83.2% against CW-$L_2$ (κ=0) attack. Using adversarial samples, we achieve 75.6% against CW-$L_2$ (κ=40). Due to our deliberate choice of low-capacity classifiers, Target Training does not withstand $L_\infty$ adaptive attacks in CIFAR10 but withstands CW-$L_\infty$ (κ=0) in MNIST. Target Training presents a fundamental change in adversarial defense strategy.
One-sentence Summary: Target Training tricks untargeted attacks into becoming attacks targeted at designated target classes.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=4rtHXkqItt
16 Replies

Loading