SpaceEvo: Searching Hardware-Friendly Search Space for Efficient Int8 Inference

Li Lyna Zhang; Xudong Wang; Jiahang Xu; Quanlu Zhang; Yuqing Yang; Ningxin Zheng; Ting Cao; Mao Yang

SpaceEvo: Searching Hardware-Friendly Search Space for Efficient Int8 Inference

Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Neural Architecture Search, Search Space Design, INT8 Quantization, Edge Hardware

TL;DR: We introduce techniques to search a quantization-friendly search space for a given device

Abstract: INT8 quantization is an essential compression tool to deploy a deep neural network (DNN) on resource-limited edge devices. While it greatly reduces model size and memory cost, current edge-regime DNN models cannot well utilize INT8 quantization to reduce inference latency. In this work, we find that the poor INT8 latency performance is due to the quantization-unfriendly issue: the operator and configuration (e.g., channel width) choices in a normal model design space lead to diverse quantization efficiency and can slow down the INT8 latency. To alleviate this issue, we propose SpaceEvo to efficiently search a novel hardware-aware, quantization-friendly search space, where its top-tier sub-networks achieve both superior quantization efficiency and accuracy. The key idea is to automatically evolve hardware-preferred operators and configurations guided by a search space quality metric, called Q-T score. However, naively training a candidate space from scratch for Q-T score evaluation brings prohibitive training cost, making it difficult to evolve search space on large-scale tasks (e.g., ImageNet). We further propose to conduct block-wise training and build INT8 accuracy lookup table to greatly reduce the cost. On diverse devices, SpaceEvo consistently outperforms existing manually-designed search spaces by producing both tiny and large quantized models with superior ImageNet accuracy and hardware efficiency. The discovered models, named SeqNet, achieve up to 10.1% accuracy improvement under the same latency. Our study addressed the hardware-friendly search space design challenge in NAS and paved the way for searching the search space towards efficient deployment.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

5 Replies

Loading