An Adaptive Logarithm Quantization Method for DNN CompressionOpen Website

2021 (modified: 02 Nov 2022)ICONIP (5) 2021Readers: Everyone
Abstract: The size and complexity of Neural Network models grow rapidly in recent years, which makes the inference of these models require more computational and memory resources. To reduce the required resources, quantization is one of the promising methods. Logarithm quantization can both reduce the model size and the computational complexity because the time-consuming multiplication operation can be replaced with the addition operation in logarithm domain. However, the previous logarithm quantization methods use a fixed logarithm base. Therefore, they cannot adapt according to the distribution of data and bit-width budgets, which causes performance degradation. To address such a problem, we propose an adaptive quantization method to optimize the quantization function. Our method first finds an optimized weight quantization function by minimizing the quantization loss of the model’s weight data under a given bit-width budget. Then we use a zero-shot way to find an optimized quantization function for activation data. Finding the optimized parameters is time-consuming. We propose a heuristic algorithm to solve the optimization problem fast. Compared to the previous logarithm quantization methods, our method can achieve up to 72.53% higher Top-1 accuracy under the same bit-width constraint.
0 Replies

Loading