DL-VOPU: An Energy-Efficient Domain-Specific Deep-Learning-Based Visual Object Processing Unit Supporting Multi-Scale Semantic Feature Extraction for Mobile Object Detection/Tracking Applications

Yuchuan Gong, Teng Zhang, Hongtao Guo, Xiyuan Liu, Jingxiao Zheng, Hongqiang Wu, Conghan Jia, Luying Que, Liang Zhou, Liang Chang, Jun Zhou

Published: 01 Jan 2023, Last Modified: 10 May 2023ISSCC 2023Readers: Everyone

Abstract: In the recent years, deep learning-based visual object detection/tracking (VODT) has been widely used in intelligent applications such as autonomous driving, UAV, smart robot and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\text{VR}/\text{AR}$</tex> . As general Al hardware platforms, GPUs and general Al processors are often used for accelerating VODT. However, without a domain-specific architecture, it is difficult for these processors to achieve high energy efficiency, making them unsuitable for mobile VODT applications. Recently, some dedicated VODT processors have been proposed with improved energy efficiency [1]–[3]. As shown in <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\text{Fig}. 22.7.1$</tex> , these designs have several issues: 1) they only support a single task (either detection or tracking), 2) they lack full support for multi-scale semantic feature extraction <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\text{MSFE}){-}$</tex> based state-of-the-art VODT frameworks [4], and 3) they do not sufficiently exploit domain-specific features for energy efficiency optimization. To address these issues, in this work, a deep learning-based visual object processor (named DL-VOPU) is proposed for mobile VODT applications. It exploits diverse domain-specific features to achieve record-high energy efficiency for VODT, while supporting MSFE-based VODT frameworks with a programmable backbone network. The DL-VOPU features: 1) an energy-efficient MSFE-aware Al architecture, 2) an object-oriented adaptive computing technique for energy-efficient object tracking, 3) a parallel frame-difference computing technique for energy-efficient neural network (NN) computation on video streams, and 4) a unified data compression & computing technique to address data redundancy in VODT processing.

0 Replies