msf-CNN: Multi-Stage Fusion with Convolutional Neural Networks for TinyML

14 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
TL;DR: msf-CNN efficiently uses fusion to optimize CNN inference execution on MCUs, reducing RAM usage by up to 50% compared to prior methods while offering greater design flexibility.
Abstract: AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must remain small to fit real-time constraints. An approach to tackle this is *fusion*, which aims to optimize data flows across neural network layers. In this paper, we introduce *msf-CNN*, a novel technique that efficiently finds optimal fusion settings for convolutional neural networks (CNNs) by walking through the fusion solution space represented as a directed acyclic graph. Compared to previous work on CNN fusion for MCUs, msf-CNN identifies a wider set of solutions. We published an implementation of msf-CNN running on various microcontrollers (ARM Cortex-M, RISC-V, ESP32). We show for instance that msf-CNN achieves inference using 50\% less RAM compared to the prior art (MCUNetV2 and StreamNet). msf-CNN thus offers additional flexibility for system designers.
Primary Area: Deep Learning
Keywords: TinyML, DNN, Fusion, Microcontroller, Random access memory
Submission Number: 1641
Loading