Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks

Ouns El Harzli; Bernardo Cuenca Grau

Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks

Ouns El Harzli, Bernardo Cuenca Grau

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial attacks, kernel, neural networks

Abstract: Adversarial examples ---imperceptibly modified data inputs designed to mislead machine learning models--- have raised concerns about the robustness of modern neural architectures in safety-critical applications. In this paper, we propose a unified mathematical framework for understanding adversarial examples in neural networks, corroborating Szegedy et al.'s original conjecture that such examples are exceedingly rare, despite their presence in the proximity of nearly every test case. By exploiting Mercer's decomposition theorem, we characterise adversarial examples as those producing near-zero Mercer's eigenvalues in the empirical kernel associated to a trained neural network. Consequently, the generation of adversarial attacks, using any known technique, can be conceptualised as a progression towards the eigenvalue space's zero point within the empirical kernel. We rigorously prove this characterisation for trained neural networks that achieve interpolation and under mild assumptions on the architecture, thus providing a mathematical explanation for the apparent contradiction of neural networks excelling at generalisation while remaining vulnerable to adversarial attacks. We have empirically verified that adversarial examples generated for both fully-connected and convolutional architectures through the widely-known DeepFool algorithm and through the more recent Fast Adaptive Boundary (FAB) method consistently lead to a shift in the distribution of Mercer's eigenvalues toward zero. These results are in strong agreement with predictions of our theory.

Supplementary Material: zip

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10147

Loading