Abstract: In this work, we perform a comprehensive study on post-training quantization methods for convolutional neural networks in two challenging tasks: classification and object detection. Furthermore, we introduce a novel method that quantizes every single layer to the smallest bit width, which does not introduce accuracy degradation. As a result, the model layers are compressed to a variable number of bit widths preserving the quality of the model. We provide experiments on object detection and classification task and show, that our method compresses convolutional neural networks up to 87% and 49% in comparison to 32 bits floating-point and naively quantized INT8 baselines respectively while maintaining desired accuracy level.
Loading