Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Published: 11 Feb 2025, Last Modified: 13 May 2025MLSys 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Neural Network Deployment, TinyML, Pruning, Microcontrollers, RISC-V
TL;DR: Acceleration of N:M pruned deep neural networks on RISC-V microcontrollers with efficient software kernels and lightweight hardware extensions.
Abstract: The acceleration of pruned Deep Neural Networks (DNNs) on edge devices such as Microcontrollers (MCUs) is a challenging task, given the tight area- and power-constraints of these devices. In this work, we propose a three-fold contribution to address this problem. First, we design a set of optimized software kernels for N:M pruned layers, targeting ultra-low-power, multicore RISC-V MCUs, which are up to 2.1$\times$ and 3.4$\times$ faster than their dense counterparts at 1:8 and 1:16 sparsity, respectively. Then, we implement a lightweight Instruction-Set Architecture (ISA) extension to accelerate the indirect load and non-zero indices decompression operations required by our kernels, obtaining up to 1.9$\times$ extra speedup, at the cost of a 5\% area overhead. Lastly, we extend an open-source DNN compiler to utilize our sparse kernels for complete networks, showing speedups of 3.21$\times$ and 1.81$\times$ on a ResNet18 and a Vision Transformer (ViT), with less than 1.5\% accuracy drop compared to a dense baseline.
Submission Number: 192
Loading