A High-speed Low-cost CNN Inference Accelerator for Depthwise Separable Convolution

Yingcheng Lin, Rui Li, Wei He, Xichuan Zhou, Junxian He, Ping Li, Ying Jiang, Liyuan Liu, Nanjian Wu, Cong Shi

Published: 2020, Last Modified: 20 May 2025ICTA 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes a high-speed low-cost VLSI inference accelerator for depthwise separable convolution in deep convolutional neural networks (CNNs). The accelerator consists of parallel, pipelined depthwise convolution processing element array and pointwise convolution processing element array to improve system performance, with optimized hardware resource consumptions. Moreover, the PPE array can support fully-connected layers in CNNs. An FPGA prototype of the proposed accelerator was implemented. It executed a 12-layer simplified MobileNet model at a high speed above 15,000 frames per second (FPS) on $32\times 32$ images and 240 FPS on $224\times 224$ images.