Abstract: A substantial body of empirical work documents the lack of robustness in deep learning models to
adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in twolayers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU
networks with sub-exponential width. We present a result of the same type, with no restriction on width
and for general locally Lipschitz continuous activations.
More precisely, given a neural network f(· ; θ) with random weights θ, and feature vector x, we
show that an adversarial example x
′
can be found with high probability along the direction of the
gradient ∇xf(x; θ). Our proof is based on a Gaussian conditioning technique. Instead of proving that
f is approximately linear in a neighborhood of x, we characterize the joint distribution of f(x; θ) and
f(x
′
; θ) for x
′ = x − s(x)∇xf(x; θ), where s(x) = sign(f(x; θ)) · sd for some positive step size sd.
0 Replies
Loading