Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations
Abstract: Deep networks have achieved impressive results across a variety of important tasks. However, a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose \emph{Fortified Networks}, a simple transformation of existing networks, which “fortifies” the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both black-box and white-box threat models; (ii) suggest that our improvements are not primarily due to the problem of deceptively good results due to degraded quality in the gradient signal (the gradient masking problem) and (iii) show the advantage of doing this fortification in the hidden layers instead of the input space. We demonstrate improvements in adversarial robustness on three datasets (MNIST, Fashion MNIST, CIFAR10), across several attack parameters, both white-box and black-box settings, and the most widely studied attacks (FGSM, PGD, Carlini-Wagner). We show that these improvements are achieved across a wide variety of hyperparameters.
Keywords: adversarial examples, adversarial training, autoencoders, hidden state
TL;DR: Better adversarial training by learning to map back to the data manifold with autoencoders in the hidden states.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/fortified-networks-improving-the-robustness/code)
32 Replies
Loading