Abstract: The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical
perspective. In this work, we introduce two
notions of adversarial attacks: natural or onmanifold attacks, which are perceptible by a
human/oracle, and unnatural or off-manifold
attacks, which are not. We argue that the
existence of the off-manifold attacks is a natural consequence of the dimension gap between
the intrinsic and ambient dimensions of the
data. For 2-layer ReLU networks, we prove
that even though the dimension gap does not
affect generalization performance on samples
drawn from the observed data space, it makes
the clean-trained model more vulnerable to
adversarial perturbations in the off-manifold
direction of the data space. Our main results
provide an explicit relationship between the
$\ell_2$, $\ell_\infty$ attack strength of the on/off-manifold
attack and the dimension gap.
Loading