NP-CGRA: Extending CGRAs for Efficient Processing of Light-weight Deep Neural Networks

Jungi Lee, Jongeun Lee

Published: 2021, Last Modified: 06 Feb 2025DATE 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Coarse-grained reconfigurable architectures (CGRAs) can provide both high energy efficiency and flexibility, making them well-suited for machine learning applications. However previous work on CGRAs has a very limited support for deep neural networks (DNNs), especially for recent lightweight models such as depthwise separable convolution (DSC), which are an important workload for mobile environment. In this paper, we propose a set of architecture extensions and a mapping scheme to greatly enhance CGRA's performance for DSC kernels. Our experimental results using MobileNets demonstrate that our proposed CGRA enhancement can deliver 8~ 18x improvement in area-delay product depending on layer type, over a baseline CGRA with a state-of-the-art CGRA compiler. Moreover, our proposed CGRA architecture can also speed up 3D convolution with similar efficiency as previous work, demonstrating the effectiveness of our architectural features beyond DSC layers.