Dynamic Pruning for Parsimonious CNN Inference on Embedded Systems

Paola Busia, Ilias Theodorakopoulos, Vasileios K. Pothos, Nikos Fragoulis, Paolo Meloni

Published: 2022, Last Modified: 08 May 2026DASIP 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As a consequence of the current edge-processing trend, Convolutional Neural Networks (CNNs) deployment has spread to a rich landscape of devices, highlighting the need to reduce the algorithm’s complexity and exploit hardware-aided computing, as two prospective ways to improve performance on resource-constrained embedded systems. In this work, we refer to a compression method reducing a CNN computational workload based on the complexity of the data to be processed, by pruning unnecessary connections at runtime. To evaluate its efficiency when applied on edge processing platforms, we consider a keyword spotting (KWS) task executing on SensorTile, a low-power microcontroller platform by ST, and an image recognition task running on NEURAghe, an FPGA-based inference accelerator. In the first case, we obtained a 51% average reduction of the computing workload, resulting in up to 44% inference speedup, and 15% energy-saving, while in the latter, a 36% speedup is achieved, thanks to a 44% workload reduction.
Loading