- TL;DR: A novel adversarial imitation attack to fool machine learning models.
- Abstract: Deep learning models are known to be vulnerable to adversarial examples. A practical adversarial attack should require as little as possible knowledge of attacked models T. Current substitute attacks need pre-trained models to generate adversarial examples and their attack success rates heavily rely on the transferability of adversarial examples. Current score-based and decision-based attacks require lots of queries for the T. In this study, we propose a novel adversarial imitation attack. First, it produces a replica of the T by a two-player game like the generative adversarial networks (GANs). The objective of the generative model G is to generate examples which lead D returning different outputs with T. The objective of the discriminative model D is to output the same labels with T under the same inputs. Then, the adversarial examples generated by D are utilized to fool the T. Compared with the current substitute attacks, imitation attack can use less training data to produce a replica of T and improve the transferability of adversarial examples. Experiments demonstrate that our imitation attack requires less training data than the black-box substitute attacks, but achieves an attack success rate close to the white-box attack on unseen data with no query.
- Keywords: Adversarial examples, Security, Machine learning, Deep neural network, Computer vision
- Original Pdf: pdf