Hybrid SNN-Transformer Networks for Event-Based, Energy-Efficient Large-Scale Learning

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Energy-Efficient systems, Large-Scale Learning, Transformer Networks
Abstract: The unsustainable energy demands of conventional deep learning models ($E \propto N^2$ for $N$ tokens) and the scalability limitations of Spiking Neural Networks (SNNs) motivate our Hybrid SNN-Transformer Network (HST-Net), which synergizes event-driven efficiency with Transformer scalability. At its core, HST-Net introduces a spiking self-attention mechanism ($V_{th}(t) = \alpha \sum_i w_i s_i(t) + \beta \exp(-t/\tau)$) that reduces energy by 5.8$\times$ compared to dense attention layers while preserving performance. We overcome SNN training challenges via a hybrid framework combining surrogate gradients ($\sigma'(x) = \gamma \max(0, 1-|x|)$) and modified backpropagation-through-time (BPTT) with learnable time constants ($\tau_{\text{learn}}$), enabling 3.2$\times$ faster few-shot convergence than pure SNNs. Neuromorphic-native optimizations, including asynchronous token processing for event-based vision (97.2% accuracy on N-MNIST) and hardware-aware quantization ($W_{\text{quant}} = \lfloor W/ \Delta \rceil \cdot \Delta$), ensure deployability on edge devices. HST-Net achieves state-of-the-art efficiency at scale (100M+ parameters) with applications in sustainable LLMs ($\text{CO}_2 \downarrow 70%$), neuromorphic hardware, and computational neuroscience. Code and benchmarks are open-sourced to accelerate research in energy-efficient AI.
Submission Number: 9
Loading