ILD-MPQ: Learning-Free Mixed-Precision Quantization with Inter-Layer Dependency Awareness

Ruge Xu, Qiang Duan, Qibin Chen, Xinfei Guo

Published: 2024, Last Modified: 15 Nov 2024AICAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the increasing adoption of mixed-precision quantization (MPQ) on edge AI devices, deep neural networks (DNNs) can achieve a satisfactory balance between accuracy and efficiency. However, many existing MPQ methods assumed inter-layer independence in DNNs and focus on optimizing bit-width schemes at the single layer level, leading to an additional loss of accuracy. Recently, several work looked into the inter-layer dependency and applied it in finding optimal MPQ schemes. These work either relied on leaning-based solutions that gave less explanations or missed the empirical validation of various heuristics. In this paper, we dig into the factors that lead to the inter-layer dependency and propose a learning-free inter-layer dependency-aware search method using the NSGA-II algorithm, leveraging a novel per-layer influence metric. The evaluation results across MobileNetV2 and ResNet50 models demonstrate that the proposed method enhances the efficiency of post-training quantization (PTQ) models by 8.7%∼65.3% compared to state-of-the-art learning-free approaches, and guarantees a loss of model efficiency within 4.0%∼8.9% while reducing time costs by 90% compared to learning-based approaches, all under similar hardware consumption constraints.