Abstract: Building robust deep neural network (DNN) machine learning models in adversarial settings is a problem of great importance to communication and cyber security. We consider white-box attacks in which an adversary has full knowledge of the learning architecture, but the adversary's ability to manipulate is bounded in the Lp norm sense. Given that adversarial examples are generated via small perturbations to the input, we develop a scalable mathematical framework that leads to bounds on the effect of these input perturbations on the network output. We study several typical DNN components: linear transformations, ReLU, sigmoid and double ReLU units. We use the well-calibrated MNIST data for experimental validation, and present results and insights.
Loading