A Survey on Transferability of Adversarial Examples Across Deep Neural Networks

Published: 06 May 2024, Last Modified: 06 May 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables ``black-box'' attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.
Submission Length: Long submission (more than 12 pages of main content)
Code: https://github.com/JindongGu/Awesome_Adversarial_Transferability
Assigned Action Editor: ~Sanghyun_Hong1
Submission Number: 2046