Explaining the origin of adversarial attacks using in-distribution adversarial examples.

Explaining the origin of adversarial attacks using in-distribution adversarial examples.

TMLR Paper1301 Authors

17 Jun 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Understanding why deep neural networks are susceptible to adversarial attacks remains an open question. While several theories have been proposed, it is unclear which of these are more valid in practice and relevant for object recognition. Here, we propose using the newly discovered phenomenon of in-distribution adversarial attacks to compare different theories, and highlight one theory which can explain the presence of these more stringent attacks within the training distribution---the ground-truth boundary theory. The key insight behind this theory is that in high dimensions, most data points are close to the ground-truth class boundaries. While this has been shown in theory for some simple data distributions, it is unclear if these theories are relevant in practice for object recognition. Our results demonstrate the existence of in-distribution adversarial examples for object recognition, providing evidence supporting the ground-truth boundary theory---attributing adversarial examples to the proximity of data to ground-truth class boundaries, and calls into question other theories which do not account for this more stringent definition of adversarial attacks. These experiments are enabled by our novel gradient-free, evolutionary strategies (ES) based approach for finding in-distribution adversarial examples, which we call CMA-Search.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: Reviewer feedback by all three reviewers has been addressed and additional experiments have been added. Additional experiments measure attack rate using a consistent metric across all datasets, and also added measurement of attack rate with larger sample size based on suggestions made in the second round of reviews. We have also updated the text to clarify details regarding attack rate, and computational efficiency of CMA-ES, among others. We have also moved several details to the supplement to shorten the paper as suggested in the reviews. Changes made in the first revision were highlighted in yellow, and new changes made in this second revision are in blue text for ease of reference.

Assigned Action Editor: ~Cho-Jui_Hsieh1

Submission Number: 1301

Loading