Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Published: 01 Nov 2023, Last Modified: 22 Dec 2023MLNCP PosterEveryoneRevisionsBibTeX
Keywords: Recurrent neural networks, Efficient machine learning, Neuromorphic compute, Pruning, Weight sparsity, Activity sparsity, Language modelling
TL;DR: Combing activity sparsity and weight sparsity to achieve greater efficiency on language modelling tasks
Abstract: Artificial neural networks open up unprecedented machine learning capabilities at the cost of seemingly ever growing computational requirements. Concurrently, the field of neuromorphic computing develops biologically inspired spiking neural networks and hardware platforms with the goal of bridging the efficiency-gap between biological brains and deep learning systems. Yet, spiking neural networks often times fall behind deep learning systems on many machine learning tasks. In this work, we demonstrate that the reduction factor of sparsely activated recurrent neural networks multiplies with the reduction factor of sparse weights. Our model achieves up to $20\times$ reduction of operations while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This reduction factor has not be achieved with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved with sparsely activated spiking neural networks. Our results suggest to further drive convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
Submission Number: 18