Abstract: Network quantization is an effective compression technique of deep neural networks (DNNs) for on-device machine learning at consumer devices. Existing layer-wise quantization techniques allocate different bitwidths to different network layers. In this paper, we propose a filter-wise quantization technique based on the differentiable neural architecture search (DNAS). We use a two-level network structure and a novel candidate generation algorithm, which can substantially prune the large search space. The effectiveness of our technique was validated with MobileNetV2 on ImageNet.
0 Replies
Loading