EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization

Tianzhe Wang; Kuan Wang; Han Cai; Ji Lin; Yujun Lin; Zhijian Liu; Song Han

EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Song Han

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

TL;DR: We present an end-to-end design methodology for efficient deep learning deployment.

Abstract: We present an end-to-end design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we jointly optimize them in an end-to-end manner. To deal with the larger design space it brings, we train a quantization-aware accuracy predictor that fed to the evolutionary search to select the best fit. We first generate a large dataset of <NN architecture, ImageNet accuracy> pairs without training each architecture, but by sampling a unified supernet. Then we use these data to train an accuracy predictor without quantization, further using predictor-transfer technique to get the quantization-aware predictor, which reduces the amount of post-quantization fine-tuning time. Extensive experiments on ImageNet show the benefits of the end-to-end methodology: it maintains the same accuracy (75.1%) as ResNet34 float model while saving 2.2× BitOps comparing with the 8-bit model; we obtain the same level accuracy as MobileNetV2+HAQ while achieving 2×/1.3× latency/energy saving; the end-to-end optimization outperforms separate optimizations using ProxylessNAS+AMC+HAQ by 2.3% accuracy while reducing orders of magnitude GPU hours and CO2 emission.

Keywords: End-to-end Design, Joint Optimization, Architecture Search, Network Pruning, Network Quanzation

Original Pdf: pdf

6 Replies

Loading