Algorithm-Hardware Co-design for Accelerating Depthwise Separable CNNs

Guoqing Li, Rengang Li, Tuo Li, Tinghuan Chen, Meng Zhang, Henk Corporaal

Published: 01 Jan 2025, Last Modified: 09 Nov 2025ACM Trans. Design Autom. Electr. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Depthwise separable convolution (DSC) is a popular method for constructing lightweight neural networks. However, the pointwise convolution (PWC) has a much larger number of parameters than the depthwise convolution (DWC), causing the imbalanced parameter ratio of PWC to DWC. In this article, we propose an efficient and hardware-efficiency convolution (Shared Kernel sliding on channel Convolution, SKC) to replace the redundant PWC in DSC for a balanced parameter ratio, where SKC customizes the sharing kernel in the channel dimension to reduce the number of parameters, and the local connection in the channel dimension reduces the computation. Furthermore, the proposed SKC is suitable for Winograd acceleration, and the large kernel decomposition method is introduced to facilitate its use. We implement the first Winograd-based FPGA hardware accelerator for DSCNets. The shared 1D and 2D Winograd convolution computing engine is proposed to compute the proposed DSC consisting of DWC and SKC efficiently. An alternating loading and reusing storage approach is developed to efficiently load SKC input feature maps. Experimental results show our DSC-based accelerator can achieve 20× higher power efficiency at the cost of a small loss of accuracy by algorithm-hardware co-design compared with traditional accelerators.

External IDs:dblp:journals/todaes/LiLLCZC25