Preliminary Analysis of Adversarial Susceptibility in Neural Networks via Analogy to Chaos-Theoretic Sensitive Dependence
Abstract: Although the susceptibility of neural networks has been repeatedly demonstrated experimentally, with a vast array of new attacks and defenses having been developed around this principle, theoretical analysis as to why these models succumb to these attacks in the first place has been limited. This paper uses ideas from Chaos Theory to explain, analyze, and quantify the degree to which neural networks are susceptible to or robust against adversarial attacks, following briefly with preliminary experiments to demonstrate the validity of our ideas. To aid in experimental analysis, we present a new metric, the "susceptibility ratio," given by $\hat \Psi(h, \theta)$, which captures how greatly a model's output will be changed by perturbations to a given input.
Our theoretical and experimental results show that susceptibility to attack grows significantly with the depth of the model, which has safety implications for the design of neural networks for production environments. We provide experimental evidence of the relationship between $\hat \Psi$ and the post-attack accuracy of classification models, as well as a discussion of its application to tasks lacking hard decision boundaries.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=lPw8Xdzw5f
Changes Since Last Submission: I have edited the prose, added discussions connecting my work to algorithmic stability and the Lipschitz constant, cleaned up notation, cited more related work, and added further explanations in a couple areas. N/
Assigned Action Editor: ~W_Ronny_Huang1
Submission Number: 1489
Loading