Abstract: Quantization of neural networks enables faster inference, reduced memory usage, and lower energy consumption, all of which are crucial for deploying AI algorithms on devices. However, quantization may degrade performance compared to full-precision models as precision decreases. While prior research has primarily focused on uniformly quantizing network weights and activations, capturing the long-tail distributions of these quantities imposes a challenge. To address this issue, this paper introduces a non-uniform learned step-size quantization (nuLSQ) approach. It optimizes individual step sizes for quantizing weights and activations. Evaluations on CIFAR-10/100 and ImageNet datasets, using ResNet, MobileNetV2, Swin-T, and ConvNeXT with 2-, 3-, and 4-bit precisions, demonstrate that nuLSQ outperforms other quantization methods. The code is available at https://github.com/DensoITLab/nuLSQ.
Loading