Keywords: adversarial attack, robustness, image classification, automatic recognition system
TL;DR: A black-box attack method which can learn imperceptible adversarial distribution
Abstract: An effective black-box threat model should find a sweet spot that balances well across success rate, perceptual quality, and query efficiency. In this paper, we propose PadvFlow, a black-box attack method that achieves the desirable property. Instead of searching for examples in a conventional $\ell_p$ space, PadvFlow leverages the use of normalizing flows (NFs) to model the density distribution of natural and indistinguishable adversarial examples in a perceptual space. The expressive NFs can reduce the perceptible noises. Meanwhile, searching for adversarial samples via the perceptual space improves details of generation. Thus, PadvFlow can generate perceptually-natural adversarial examples. Our comprehensive experiments show that PadvFlow not only successfully attacks 6 undefended and 4 defended image classifiers on CIFAR-10 and SVHN, but also can be scaled up to attack ImageNet of pixel size $299\times299$. The effectiveness of PadvFlow is also validated for a different modality by attacking an automatic speech recognition system.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
4 Replies
Loading