Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Certified Defenses against Adversarial Examples
Aditi Raghunathan, Jacob Steinhardt, Percy Liang
Feb 15, 2018 (modified: Feb 24, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no that perturbs each pixel by at most $\epsilon = 0.1$ can cause more than $35\%$ test error.
TL;DR:We demonstrate a certifiable, trainable, and scalable method for defending against adversarial examples.
Keywords:adversarial examples, certificate of robustness, convex relaxations
Enter your feedback below and we'll get back to you as soon as possible.