Attacking the Spike: On the Security of Spiking Neural Networks to Adversarial Examples

Attacking the Spike: On the Security of Spiking Neural Networks to Adversarial Examples

TMLR Paper2761 Authors

27 May 2024 (modified: 22 Oct 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remain relatively underdeveloped. In this work, we focus on advancing the adversarial attack side of SNNs and make three major contributions. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient estimation technique, even in the case of adversarially trained SNNs. Second, using the best single surrogate gradient estimation technique, we analyze the transferability of adversarial attacks on SNNs and other state-of-the-art architectures like Vision Transformers (ViTs), as well as CNNs. Our analyzes reveal two key areas where SNN adversarial attacks can be enhanced: no white-box attack effectively exploits the use of multiple surrogate gradient estimators for SNNs, and no single model attack is effective at generating adversarial examples misclassified by both SNNs and non-SNN models simultaneously. For our third contribution, we develop a new attack, the Mixed Dynamic Spiking Estimation (MDSE) attack to address these issues. MDSE utilizes a dynamic gradient estimation scheme to fully exploit multiple surrogate gradient estimator functions. In addition, our novel attack generates adversarial examples capable of fooling both SNN and non-SNN models simultaneously. The MDSE attack is as much as $91.4\%$ more effective on SNN/ViT model ensembles and provides a $3\times$ boost in attack effectiveness on adversarially trained SNN ensembles, compared to conventional white-box attacks like Auto-PGD. Our experiments are broad and rigorous, covering three datasets (CIFAR-10, CIFAR-100 and ImageNet) and nineteen classifier models (seven for each CIFAR dataset and five models for ImageNet). We will release a full publicly available code repository for the models and attacks upon publication.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We have revised the content based on the feedback from the reviewers and incorporated detailed responses in the rebuttals. Additionally, we have included further discussions and experimental results in the revised version.

Assigned Action Editor: ~Robert_Legenstein1

Submission Number: 2761

Loading