Keywords: Spiking Neuron Networks; Spike-driven; Large Language Mode
TL;DR: The first energy-efficient and scalable large language model with spike-driven nature
Abstract: Inspired by biological computing mechanisms, Spiking Neural Networks (SNNs), with their spike-driven operations and spatiotemporal dynamics, offer a promising solution for constructing energy-efficient language models. Although prior research has attempted to integrate SNNs with Large Language Models (LLMs), these approaches often suffer from limited performance or low inference efficiency. To tackle these challenges, we propose a Spike-driven Large Language Model (SDLLM) that enables large-scale modeling by eliminating matrix multiplications and relying solely on sparse additions. Specifically, we propose a two-step spike quantization strategy to address the numerous outliers in LLM activation values, significantly mitigating the accuracy loss caused by binary spike trains. To further reduce the spike firing rate, we introduce bidirectional encoding under symmetric quantization, along with a membrane potential clipping mechanism, which together reduce energy consumption without compromising accuracy. Extensive experiments demonstrate that SDLLM performs effectively on both language modeling and commonsense QA tasks. For example, compared to previous spike-based LLMs, our SDLLM reduces energy consumption by 7.8$\times$ and improves accuracy in common scene reasoning by 4.2%. To the best of our knowledge, SDLLM marks the first demonstration of SNNs outperforming quantized ANN models in terms of both performance and energy efficiency in LLM scenarios.
and improves accuracy in common scene reasoning by 4.2%. To the best of our knowledge, SDLLM marks the first demonstration of SNNs outperforming quantized ANN models in terms of both performance and energy efficiency in LLM scenarios.
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 23309
Loading