Keywords: Adversarial Attacks, Deep Learning, Optimal Transport, Residual Networks, Regularization
TL;DR: We propose a detector of adversarial attacks inspired by the dynamic viewpoint of neural networks and a regularization that improves detection of adversarial attacks and test accuracy.
Abstract: Adversarial attacks are perturbations to the input that don't change its class for a human observer, but fool a neural network into changing its prediction. In this paper, we propose a detector of such attacks that is based on the view of residual networks as discrete dynamical systems. The detector tells clean inputs from abnormal ones by comparing the discrete vector fields they follow throughout the network's layers before the final classification layer. We compare this detector favorably to other detectors on seen and unseen attacks. We also show that regularizing this vector field during training makes the network more regular on the data distribution's support, thus making the network's activations on clean samples more distinguishable from those of abnormal samples. This regularization of the network's dynamics improves the performance of any detection method that uses the internal embeddings as inputs, while also improving the network's test accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning