Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Adversarial Machine Learning at Scale
Alexey Kurakin, Ian J. Goodfellow, Samy Bengio
Nov 03, 2016 (modified: Feb 11, 2017)ICLR 2017 conference submissionreaders: everyone
Abstract:Adversarial examples are malicious inputs designed to fool machine learning models.
They often transfer from one model to another, allowing attackers to mount black
box attacks without knowledge of the target model's parameters.
Adversarial training is the process of explicitly training a model on adversarial
examples, in order to make it more robust to attack or to reduce its test error
on clean inputs.
So far, adversarial training has primarily been applied to small problems.
In this research, we apply adversarial training to ImageNet.
Our contributions include:
(1) recommendations for how to succesfully scale adversarial training to large models and datasets,
(2) the observation that adversarial training confers robustness to single-step attack methods,
(3) the finding that multi-step attack methods are somewhat less transferable than single-step attack
methods, so single-step attacks are the best for mounting black-box attacks,
(4) resolution of a ``label leaking'' effect that causes adversarially trained models to perform
better on adversarial examples than on clean examples, because the adversarial
example construction process uses the true label and the model can learn to
exploit regularities in the construction process.
Keywords:Computer vision, Supervised Learning
Enter your feedback below and we'll get back to you as soon as possible.