Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang; Zhanglu Yan; Weng-Fai Wong

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang, Zhanglu Yan, Weng-Fai Wong

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a BitShifting-based PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at [https://github.com/Kaiwen-Tang/Sorbet](https://github.com/Kaiwen-Tang/Sorbet)

Lay Summary: To protect user privacy, we aim to run language models directly on small devices like phones, which have limited computing power and need to save energy. Many key steps in standard language models are hard to run efficiently on such low-power hardware. To solve this, we developed Sorbet, a new language model based on spiking neural networks (SNN) that is designed specifically for energy-efficient hardware. Sorbet replaces traditional power-hungry operations with new methods, making the model much more energy-saving. Using advanced techniques, we also compressed Sorbet into a very small model while keeping strong accuracy. Our tests show that Sorbet uses over 27 times less energy than typical models. This work opens the door to smarter, more energy-efficient devices that better protect user privacy.

Link To Code: https://github.com/Kaiwen-Tang/Sorbet

Primary Area: Deep Learning->Algorithms

Keywords: Spiking Neural Networks; Small Language Models; Energy Efficiency

Submission Number: 11293

Loading