Keywords: Large Language Models; Efficient Machine Learning; Spiking Neural Networks
Abstract: Transformer-scale large language models (LLMs) deliver state-of-the-art accuracy but demand heavy floating-point computation and memory bandwidth, making them impractical for low-power devices. Spiking neural networks (SNNs) promise efficiency through sparse, event-driven communication, yet current ANN-to-SNN conversion pipelines still rely on floating-point softmax, RMSNorm and SwiGLU/SiLU or fall back to ReLU-compatible spiking surrogates, often requiring fine-tuning to recover accuracy. This work introduces a family of spike-friendly approximations that collectively replace softmax, RMSNorm and SwiGLU/SiLU. Each operator is built from simple shifts, comparisons and integer additions, requires no lookup tables or floating-point units, and can be seamlessly integrated into existing conversion pipelines without fine-tuning the original weights. We provide theoretical error bounds and integrate the approximations into the SNN conversion pipeline for LLaMA models. Experiments show that the resulting fully spike-driven LLMs maintain performance comparable to SNN large models with floating-point activations, while avoiding the need for post-conversion fine-tuning. These results pave a practical path toward deploying large Transformers on neuromorphic hardware.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15406
Loading