SmoothSpike: Spiking Transformer with Learnable Hadamard Transformation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Spiking Neural Networks (SNNs) have attracted growing attention due to their sparse spike-based communication and inherent temporal dynamics. However, their discrete information representation fundamentally limits expressiveness, resulting in a notable performance gap relative to Artificial Neural Networks (ANNs) on language modeling tasks. In this paper, we reveal that this gap is fundamentally rooted in a spike saturation-induced information homogenization problem: within a bounded time window, distinct high-amplitude inputs converge to identical spike counts, compressing neural representations and impairing fine-grained semantic discrimination across layers. To address this, we propose SmoothSpike, which applies a randomized Hadamard transformation to smooth pre-activation inputs and theoretically proves that it bounds the maximum input to $\mathcal{O}(\sqrt{\frac{\log n}{n}})$ with high probability. To further improve adaptability across varying input distributions, we extend the fixed transformation within SmoothSpike to a learnable orthogonal matrix updated via Newton-Schulz iterations, which can be fused into model weights at inference with no additional overhead. Experiments on the GLUE benchmark show that SmoothSpike effectively reduces information homogenization, yielding an 8.2\% average improvement over the Spikingformer baseline without compromising the efficiency inherent to spike-driven computation. These results advance the prospects for energy-efficient and high-performance language modeling on edge devices. Code is available at https://github.com/CayleyZ/SmoothSpike.
Lay Summary: Modern language models are powerful but costly to run, especially on small or energy-limited devices. Spiking neural networks are a promising alternative because they communicate with brief binary pulses, but this same discrete signaling makes them less accurate on language tasks. This paper shows that a key reason is saturation: when a spiking neuron receives several large but different inputs, it may produce the same maximum number of spikes for all of them. As a result, useful distinctions between words or sentences can be blurred as information moves through the network. We introduce SmoothSpike, a method that rotates and spreads each input signal before it reaches spiking neurons, reducing extreme values that cause saturation. The rotation is learnable, so it can adapt to different layers and data distributions, and it can be folded into the model weights at inference time without adding extra computation. On standard language understanding benchmarks, SmoothSpike consistently improves spiking Transformer performance while preserving spike-based efficiency. These results suggest that smoothing internal representations can make energy-efficient spiking language models more accurate and practical for resource-limited AI systems.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/CayleyZ/SmoothSpike
Primary Area: Applications->Neuroscience, Cognitive Science
Keywords: Spiking Neural Network, Spiking Transformers
Originally Submitted PDF: pdf
Submission Number: 31556
Loading