Keywords: Spiking Neural Networks, Spiking self-attention, Spiking transformer
Abstract: Integrating Spiking Neural Networks (SNNs) with Transformer architectures offers a promising pathway to balance energy efficiency and performance, particularly for edge vision applications. However, existing Spiking Transformers face two critical challenges: i) a substantial performance gap relative to their Artificial Neural Network (ANN) counterparts, and ii) considerable memory overhead. Our theoretical analysis and empirical evidence indicate that these limitations arise from the unfocused global attention paradigm of Spiking Self Attention (SSA) and the storage cost of large attention matrices. Inspired by the localized receptive fields and membrane potential dynamics of biological visual neurons, we propose LRF-Dyn, which enables attention computation via spiking neurons endowed with localized receptive fields. Specifically, we integrate a LRF mechanism into SSA, enabling the model to allocate greater attention to neighboring regions and thereby enhance local modeling capacity. Moreover, LRF-Dyn approximates the charge–fire–reset dynamics of spiking neurons within the LRF-SSA, substantially reducing memory requirements during inference. Extensive experiments on visual tasks confirm that our method lowers memory overhead while delivering significant performance improvements.
These results establish LRF-Dyn as a key component for achieving energy-efficient Spiking Transformers.
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 15219
Loading