Keywords: deep learning, adversarial attack, robust certification
Abstract: Recent work has demonstrated that neural networks are vulnerable to small, adversarial perturbations of their input. In this paper, we propose an efficient regularization scheme inspired by convex geometry and barrier methods to improve the robustness of feedforward ReLU networks. Since such networks are piecewise linear, they partition the input space into polyhedral regions (polytopes). Our regularizer is designed to minimize the distance between training samples and the \textit{analytical centers} of their respective polytopes so as to push points away from the boundaries. Our regularizer \textit{provably} improves a lower bound on the necessary adversarial perturbation required to switch an example's label. The addition of a second regularizer that encourages linear decision boundaries improves robustness while avoiding over-regularization of the classifier. We demonstrate the robustness of our approach with respect to $\ell_\infty$ and $\ell_2$ adversarial perturbations on multiple datasets. Our method is competitive with state-of-the-art algorithms for learning robust networks. Moreover, applying our algorithm in conjunction with adversarial training boosts the robustness of classifiers even further.
One-sentence Summary: We propose a novel geometric regularization term which provably improves the robustness of neural networks.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=i_eKfmWg3K
10 Replies
Loading