Adversarial examples in random neural networks with general activations

Yuchen Wu, Andrea Montanari

08 Apr 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in twolayers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations. More precisely, given a neural network f(· ; θ) with random weights θ, and feature vector x, we show that an adversarial example x ′ can be found with high probability along the direction of the gradient ∇xf(x; θ). Our proof is based on a Gaussian conditioning technique. Instead of proving that f is approximately linear in a neighborhood of x, we characterize the joint distribution of f(x; θ) and f(x ′ ; θ) for x ′ = x − s(x)∇xf(x; θ), where s(x) = sign(f(x; θ)) · sd for some positive step size sd.

0 Replies