Asymmetric Certified Robustness via Feature-Convex Neural Networks

Published: 21 Sept 2023, Last Modified: 20 Dec 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: asymmetric certified robustness, input-convex neural networks
TL;DR: This work composes input-convex neural networks with a Lipschitz feature map to produce a classifier with closed-form asymmetric certified radii for any norm.
Abstract: Real-world adversarial attacks on machine learning models often feature an asymmetric structure wherein adversaries only attempt to induce false negatives (e.g., classify a spam email as not spam). We formalize the asymmetric robustness certification problem and correspondingly present the feature-convex neural network architecture, which composes an input-convex neural network (ICNN) with a Lipschitz continuous feature map in order to achieve asymmetric adversarial robustness. We consider the aforementioned binary setting with one "sensitive" class, and for this class we prove deterministic, closed-form, and easily-computable certified robust radii for arbitrary $\ell_p$-norms. We theoretically justify the use of these models by characterizing their decision region geometry, extending the universal approximation theorem for ICNN regression to the classification setting, and proving a lower bound on the probability that such models perfectly fit even unstructured uniformly distributed data in sufficiently high dimensions. Experiments on Malimg malware classification and subsets of the MNIST, Fashion-MNIST, and CIFAR-10 datasets show that feature-convex classifiers attain substantial certified $\ell_1$, $\ell_2$, and $\ell_{\infty}$-radii while being far more computationally efficient than competitive baselines.
Supplementary Material: zip
Submission Number: 3707