Mobile-X: Dedicated FPGA Implementation of the MobileNet Accelerator Optimizing Depthwise Separable Convolution

Hyeonseok Hong, Dahun Choi, Namjoon Kim, Hyun Kim

Published: 01 Jan 2024, Last Modified: 02 Aug 2025IEEE Trans. Circuits Syst. II Express Briefs 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: MobileNet proposed depthwise separable convolution (DSC) as a replacement for standard convolution (SC), achieving significant reductions in parameters and computational complexity compared with traditional convolutional neural network (CNN) models. Recently, there has been a growing trend of deploying MobileNet on various edge devices by implementing accelerators. However, the distinctive computational patterns of depthwise convolution (DWC) and pointwise convolution (PWC) in MobileNet pose challenges for FPGA and ASIC accelerator implementations. In this brief, we propose DSC-dedicated processing engine (PE) designs specialized for DWC and PWC operations and an SC reordering module for only the first convolution layer. In addition, we introduce the pipeline DSC processing called pipelining separable convolution (PSC) and tiled-convolution (TC) techniques that consider the computational load of PWC. Our proposed 8-bit quantization in the accelerator causes only a negligible accuracy drop (i.e., 0.68%) compared with full precision, yet it enables hardware-friendly operations with only a single fixed-point multiplication. On the ZCU-102 platform, the proposed accelerator achieves 190.9 FPS and 108.3 GOPS using minimal hardware resources. Consequently, we achieve 18.20 GOPS/W, showing a $3.7\times $ power efficiency compared to the A-100 GPU.