Towards High-performance Spiking Transformers from ANN to SNN Conversion

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. There are two main approaches to constructing SNNs. Direct training methods require much memory, while conversion methods offer a simpler and more efficient option. However, current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs. Converting Transformers to SNN is challenging because of the presence of non-linear modules. In this paper, we propose an Expectation Compensation Module to preserve the accuracy of the conversion. The core idea is to use information from the previous T time-steps to calculate the expected output at time-step T. We also propose a Multi-Threshold Neuron and the corresponding Parallel Parameter normalization to address the challenge of large time steps needed for high accuracy, aiming to reduce network latency and power consumption. Our experimental results demonstrate that our approach achieves state-of-the-art performance. For example, we achieve a top-1 accuracy of 88.60\% with only a 1\% loss in accuracy using 4 time steps while consuming only 35\% of the original power of the Transformer. To our knowledge, this is the first successful ANN to SNN conversion for Spiking Transformers that achieves high accuracy, low latency, and low power consumption on complex datasets.
Primary Subject Area: [Generation] Multimedia Foundation Models
Relevance To Conference: Our paper proposes a new method to convert Transformers to Spiking Neural Networks (SNNs). The conversion of Transformers to SNNs presents several advantages that can significantly advance multimedia and multimodal processing: 1. Computational Efficiency: SNNs emulate the biological brain's spike transmission mechanism, which reduces the number of required operations, conserving energy and resources. This is especially helpful when processing multimedia and multimodal data with complex models like Transformers. 2. Temporal Data Handling: Transformers' inherent sequential processing capabilities and the temporal dynamics of SNNs allow for more natural handling of sequential data such as video and audio streams. This is crucial for multimedia processing. 3. Low Latency: SNNs' event-driven nature provides an advantage of low latency in multimedia applications. This enhances user experience and system response time. 4. Hardware Synergy: SNNs are highly compatible with neuromorphic hardware that mimics the human brain, making it well-suited for parallel and distributed multimedia processing tasks. In summary, converting Transformers to SNNs offers a new method for processing multimedia and multimodal content with high energy efficiency, real-time responsiveness, and sensitivity to temporal sequences. This can unlock new domains in complex data processing, especially in applications requiring real-time analysis and interaction.
Supplementary Material: zip
Submission Number: 1262
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview