Free Bits: Platform-Aware Latency Optimization of Mixed-Precision Neural Networks for Edge Deployment

Georg Martin Rutishauser; Francesco Conti; Luca Benini

Free Bits: Platform-Aware Latency Optimization of Mixed-Precision Neural Networks for Edge Deployment

Georg Martin Rutishauser, Francesco Conti, Luca Benini

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Edge AI, TinyML, Mixed-Precision Quantization

TL;DR: By combining differentiable precision search with platform-aware heuristics, we can reduce end-to-end latency of DNNS running on microcontrollers by up to 29.2%.

Abstract: Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. However, the search space for layer-wise quantization policies is intractable, and the execution latency of mixed-precision networks is related non-trivially and non-monotonically to precision, depending on the deployment target. This establishes the need for hardware-aware, directed heuristic search algorithms. This paper proposes a hybrid search methodology for mixed-precision network configurations consisting of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms, each with different hardware characteristics. We achieve up to $29.2\%$ reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline even on systems with no hardware support for sub-byte arithmetic at zero accuracy drop. Furthermore, we show the superiority of our approach to both a purely heuristic search and differentiable search targeting reduced binary operation counts.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)

5 Replies

Loading