Neural Network Adaptive Quantization based on Bayesian Deep Learning

Kirill Yashunin; Ivan Plokhikh; Ilya Ivanchenko; Nikita Andreyevich Radeev; Timofei Prasolov; Anton Tarasenko; Ivan Bondarenko; Rustam Mullyadzhanov

Neural Network Adaptive Quantization based on Bayesian Deep Learning

Kirill Yashunin, Ivan Plokhikh, Ilya Ivanchenko, Nikita Andreyevich Radeev, Timofei Prasolov, Anton Tarasenko, Ivan Bondarenko, Rustam Mullyadzhanov

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: adaptive quantization, epistemic uncertainty, Bayesian neural network

TL;DR: We propose a novel approach to solve the adaptive quantization problem for a neural network based on the epistemic uncertainty analysis.

Abstract: We propose a novel approach to solve the adaptive quantization problem in neural networks based on epistemic uncertainty analysis. The quantized model is treated as a Bayesian neural network with stochastic weights, where the mean values are employed to estimate the corresponding weights. Standard deviations serve as an indicator of uncertainty and the number of corresponding bits — i.e., a larger number of bits indicate lower uncertainty, and vice versa. We perform an extensive analysis of several algorithms within a novel framework for different convolutional and fully connected neural networks based on open datasets demonstrating the main advantages of the proposed approach. In particular, we introduce two novel algorithms for mixed-precision quantization. Quantile Inform utilizes uncertainty to allocate bit-width across layers, while Random Bits employs stochastic gradient-based optimization techniques to maximize the full likelihood of quantization. Using our approach, we reduce the average bit-width of the VGG-16 model to 3.05 with the 90.5% accuracy on the CIFAR-10 dataset compared to 91.9% for the non-quantized model. For the LeNet model trained on the MNIST dataset, we reduce the average bit-width to 3.16 and achieve 99.0% accuracy, almost equal to 99.2% for the non-quantized model.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13414

Loading