Gradient Methods Provably Converge to Non-Robust Networks

Gal Vardi; Gilad Yehudai; Ohad Shamir

Gradient Methods Provably Converge to Non-Robust Networks

Gal Vardi, Gilad Yehudai, Ohad Shamir

Published: 31 Oct 2022, Last Modified: 14 Dec 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: implicit bias, deep learning theory, robustness

Abstract: Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.

TL;DR: We show that depth-2 neural networks trained under a natural setting are provably non-robust, even when robust networks on the same dataset exist.

Supplementary Material: pdf

10 Replies

Loading