Delving into Channels: Exploring Hyperparameter Space of Channel Bit Widths with Linear Complexity

Zhe Wang; Jie Lin; Xue Geng; Mohamed M. Sabry Aly; Vijay Chandrasekhar

Delving into Channels: Exploring Hyperparameter Space of Channel Bit Widths with Linear Complexity

Zhe Wang, Jie Lin, Xue Geng, Mohamed M. Sabry Aly, Vijay Chandrasekhar

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Deep Learning, Neural Network Compression, Rate-Distortion Theories

Abstract: Allocating different bit widths to different channels and quantizing them independently bring higher quantization precision and accuracy. Most of prior works use equal bit width to quantize all layers or channels, which is sub-optimal. On the other hand, it is very challenging to explore the hyperparameter space of channel bit widths, as the search space increases exponentially as the number of channels, which could be tens of thousand in a deep neural network. In this paper, we address an important problem of efficiently exploring the hyperparameter space of channel bit widths. We formulate the quantization of deep neural networks as a rate-distortion optimization problem, and present an ultra-fast algorithm to search the bit allocation of channels. Our approach has only linear time complexity and can find the optimal bit allocation within a few minutes on CPU. In addition, we provide an effective way to improve the performance on target hardware platforms. We restrict the bit rate (size) of each layer to allow as many weights and activations as possible to be stored on-chip, and incorporate hardware-aware constraints into our objective function. The hardware-aware constraints do not cause additional overhead to optimization, and have very positive impact on hardware performance. Experimental results show that our approach achieves state-of-the-art quantization results on four deep neural networks, ResNet-18, ResNet-34, ResNet-50, and MobileNet-v2, on ImageNet. Hardware simulation results demonstrate that our approach is able to bring up to 3.5x and 3.0x speedup on two deep-learning accelerators, TPU and Eyeriss, respectively.

One-sentence Summary: This addresses an important problem of efficiently exploring the hyperparameter space of channel bit widths for neural network compression.

5 Replies

Loading