Keywords: Adversarial Defense, robust deep neural networks
Abstract: Deep learning models have shown impressive performance across a spectrum of computer vision applications including medical diagnosis and autonomous driving. One of the major concerns that these models face is their susceptibility to adversarial attacks. Realizing the importance of this issue, more researchers are working towards developing robust models that are less affected by adversarial attacks. Adversarial training method shows promising results in this direction. In adversarial training regime, models are trained with mini-batches augmented with adversarial samples. In order to scale adversarial training to large networks and datasets, fast and simple methods (e.g., single-step gradient ascent) are used for generating adversarial samples. It is shown that models trained using single-step adversarial training method (adversarial samples are generated using non-iterative method) are pseudo robust. Further, this pseudo robustness of models is attributed to the gradient masking effect. However, existing works fail to explain when and why gradient masking effect occurs during single-step adversarial training. In this work, (i) we show that models trained using single-step adversarial training method learns to prevent the generation of single-step adversaries, and this is due to over-fitting of the model during the initial stages of training, and (ii) to mitigate this effect, we propose a single-step adversarial training method with dropout scheduling to learn robust models. Unlike models trained using single-step adversarial training method, models trained using the proposed single-step adversarial training method are robust against both single-step and multi-step adversarial attacks, and achieve on-par results compared to the computationally expensive state-of-the-art multi-step adversarial training method, in white-box and black-box settings.
Original Pdf: pdf
7 Replies
Loading