Attacking the Madry Defense Model with $L_1$-based Adversarial Examples

Yash Sharma, Pin-Yu Chen

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal $L_\infty$ distortion $\epsilon$ = 0.3. This decision discourages the use of attacks which are not optimized on the $L_\infty$ distortion metric. Our experimental results demonstrate that by relaxing the $L_\infty$ constraint of the competition, the \textbf{e}lastic-net \textbf{a}ttack to \textbf{d}eep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion. These results call into question the use of $L_\infty$ as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.
  • Keywords: Adversarial Attacks, Adversarial Defenses, Adversarial Training, PGD, EAD, Distortion Metrics
  • TL;DR: EAD can generate minimally visually distorted adversarial examples which transfer to the Madry Defense Model, calling into question the use of $L_\infty$ as a sole measure for visual distortion, and further demonstrating the power of EAD at generating robust adversarial examples.

Loading