HeatDETR: Hardware-Efficient DETR with Device-Adaptive Thinning

Peiyan Dong; Xin Meng; PENG ZHANG; Hao Tang; Yanzhi Wang; Chih-Hsien Chou

HeatDETR: Hardware-Efficient DETR with Device-Adaptive Thinning

Peiyan Dong, Xin Meng, PENG ZHANG, Hao Tang, Yanzhi Wang, Chih-Hsien Chou

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Vision transformers (ViTs) have continuously achieved new milestones in computer vision. A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone directly, but with the price of considerable computation burden for their deployment on resource-limited edge devices. More potential usage is the DETR family, which eliminates the need for many hand-designed components in object detection but still cannot reach real-time edge applications. In this paper, we propose a novel hardware-efficient adaptive-thinning DETR (HeatDETR), achieving high speed inference on multiple edge devices and even the realtime, for the first time. Specifically, it mainly includes three contributions: 1) For decent detection performance, we introduce a backbone design principle based on the visual modeling process that focuses on locality to globality. Meanwhile, we propose a semantic-augmented module (SAM) in the backbone with the global modeling capabilities of self-attention to enhance low-level semantics. We also introduce an attention-based task-couple module (TCM) to reduce contradictions between classification and regression tasks. 2) For on-device efficiency, we propose a scale-combined module (SCM), through which we transform the multi-level detection process into the single-level process, releasing the multi-branch inference for higher hardware speed while maintaining detection performance. Then we first revisit network architectures and operators used in ViT-based models, reparametered CNNs, identify hardware-efficient design and introduce basic HeatDETR structure. 3) Based on our device-adaptive model-thinning strategy, deployable end-to-end HeatDETR on target devices can be generated efficiently. Experiments on the MS COCO dataset show HeatDETR outperforms current DETR-based methods by 0.3%~6.2% AP with 5%~68% speedup on a single Tesla V100. Even real-time inference can be achieved on extremely memory-constrained devices, e.g., Dual-Core Intel Core i7 CPU.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Supplementary Material: zip

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

4 Replies

Loading