Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Odelia Melamed; Gilad Yehudai; Gal Vardi

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Odelia Melamed, Gilad Yehudai, Gal Vardi

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Adversarial Examples, Robustness, Neural Networks, Classification

TL;DR: We prove that adversarial examples exist on trained 2-layer ReLU networks, when the data lies on a low dimensional linear subspace, and suggest two ways to mitigate it.

Abstract: Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

Supplementary Material: pdf

Submission Number: 1643

Loading