PULP-TrainLib: Enabling On-Device Training for RISC-V Multi-core MCUs Through Performance-Driven AutotuningOpen Website

2022 (modified: 09 Nov 2022)SAMOS 2022Readers: Everyone
Abstract: An open challenge in making Internet-of-Things sensor nodes “smart” and self-adaptive is to enable on-chip Deep Neural Network (DNN) training on Ultra-Low-Power (ULP) microcontroller units (MCUs). To this aim, we present a framework, based on PULP-TrainLib, to deploy DNN training tasks on RISC-V-based Parallel-ULP (PULP) MCUs. PULP-TrainLib is a library of parallel software DNN primitives enabling the execution of forward and backward steps on PULP MCUs. To optimize PULP-TrainLib’s kernels, we propose a strategy to automatically select and configure (autotune) the fastest among a set of tiling options and optimized floating-point matrix multiplication kernels, according to the tensor shapes of every DNN layer. Results on an 8-core RISC-V MCU show that our auto-tuned primitives improve MAC/clk by up to 2.4 $$\times $$ compared to “one-size-fits-all” matrix multiplication, achieving up to 4.39 MAC/clk - 36.6 $$\times $$ better than a commercial STM32L4 MCU executing the same DNN layer training workload. Furthermore, our strategy proves to be 30.7 $$\times $$ faster than AIfES, a state-of-the-art training library for MCUs, while training a complete TinyML model.
0 Replies

Loading