Filter-Wise Quantization of Deep Neural Networks for IoT Devices

Hoseung Kim, Geunhye Jo, Hayun Lee, Dongkun Shin

Published: 01 Jan 2021, Last Modified: 12 May 2023ICCE 2021Readers: Everyone

Abstract: Network quantization is an effective compression technique of deep neural networks (DNNs) for on-device machine learning at consumer devices. Existing layer-wise quantization techniques allocate different bitwidths to different network layers. In this paper, we propose a filter-wise quantization technique based on the differentiable neural architecture search (DNAS). We use a two-level network structure and a novel candidate generation algorithm, which can substantially prune the large search space. The effectiveness of our technique was validated with MobileNetV2 on ImageNet.

0 Replies