Abstract: It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$ perturbations. Many attempts have been tried to learn a network that can resist such adversarial attacks. However, most previous works either can only provide empirical verification of the defense to a particular attack method or can only develop a theoretical guarantee of the model robustness in limited scenarios. In this paper, we develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$ distance as its basic operation, which we call $\ell_\infty$-dist neuron. We show that the $\ell_\infty$-dist neuron is naturally a 1-Lipschitz function with respect to the $\ell_\infty$ norm, and the neural networks constructed with $\ell_\infty$-dist neuron ($\ell_{\infty}$-dist Nets) enjoy the same property. This directly provides a theoretical guarantee of the certified robustness based on the margin of the prediction outputs. We further prove that the $\ell_{\infty}$-dist Nets have enough expressiveness power to approximate any 1-Lipschitz function, and can generalize well as the robust test error can be upper-bounded by the performance of a large margin classifier on the training data. Preliminary experiments show that even without the help of adversarial training, the learned networks with high classification accuracy are already provably robust.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=3_fQ7F0F_0
11 Replies
Loading