DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks

Mohammadreza Baharani, Ushma Sunil, Kaustubh Manohar, Steven Furgurson, Hamed Tabkhi

Published: 2021, Last Modified: 22 Nov 2023ACM Great Lakes Symposium on VLSI 2021Readers: Everyone

Abstract: Deep Separable Convolutional Neural Network (DSCNN) has become the emerging paradigm by offering modular networks with structural sparsity to achieve higher accuracy with relatively lower operations and parameters. However, there is a lack of customized architectures that can provide flexible solutions that fit the sparsity of the DSCNNs. This paper introduces DeepDive, a fully-functional vertical co-design framework, for power-efficient implementation of DSCNNs on edge FPGAs. DeepDive's architecture supports crucial heterogeneous Compute Units (CUs) to fully support DSCNNs with various convolutional operators interconnected with structural sparsity. It offers FPGA-aware training and online quantization combined with modular synthesizable C++ CUs, customized for DSCNNs. The execution results on Xilinx's ZCU102 FPGA board demonstrate 47.4 and 233.3 FPS/Watt for MobileNet-V2 and a compact version of EfficientNet, respectively, as two state-of-the-art depthwise separable CNNs. These comparisons showcase how DeepDive improves FPS/Watt by 2.2× and 1.51× over Jetson Nano high and low power modes, respectively. It also enhances FPS/Watt by about 2.27× and 37.25× over two other FPGA implementations.

0 Replies