A Distributional Robustness Perspective on Adversarial Training with the $\infty$-Wasserstein Distance
Abstract: While ML tools are becoming increasingly used in industrial applications, adversarial examples remain a critical flaw of neural networks. These imperceptible perturbations of natural inputs are, on average, misclassified by most of the state-of-the-art classifiers. By slightly modifying each data point, the attacker is creating a new distribution of inputs for the classifier. In this work, we consider the adversarial examples distribution as a tiny shift of the original distribution. We thus propose to address the problem of adversarial training (AT) within the framework of distributional robustness optimization (DRO). We show a formal connection between our formulation and optimal transport by relaxing AT into DRO problem with an $\infty$-Wasserstein constraint. This connection motivates using an entropic regularizer-- a standard tool in optimal transport--- for our problem. We then prove the existence and uniqueness of an optimal regularized distribution of adversarial examples against a class of classifier (e.g., a given architecture) that we eventually use to robustly train a classifier. Using these theoretical insights, we propose to use Langevin Monte Carlo to sample from this optimal distribution of adversarial examples and train robust classifiers outperforming the standard baseline and providing a speed-up of respectively $\times 200$ for MNIST and $\times8$ for CIFAR-10.
One-sentence Summary: Adversarial training studied within distributionally robust optimization framework with $\infty$-$\infty$-Wasserstein Distance and inspiration of optimal transport theory.
15 Replies
Loading