Efficient Bayesian DNN Compression through Sparse Quantized Sub-distributions

Ziyi Wang; Guang Lin; Qifan Song

Efficient Bayesian DNN Compression through Sparse Quantized Sub-distributions

Ziyi Wang, Guang Lin, Qifan Song

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian Deep Neural Networks, Quantization, Pruning, Variational Inference

Abstract: This paper presents a novel method that simultaneously achieves model pruning and low-bit quantization through Bayesian variational inference to effectively compress deep neural networks (DNNs) while suffering minimal performance degradation. Unlike previous approaches that treat pruning and quantization as separate, sequential tasks, our method explores a unified optimization space, enabling more efficient compression. By leveraging a spike-and-slab prior combined with Gaussian Mixture Models (GMM), we can achieve both network sparsity and low-bit representation. Experiments on CIFAR-10, CIFAR-100, and SQuAD datasets demonstrate that our approach achieves compression rates of up to 32x with less than a $1.3\\%$ accuracy loss on the CIFAR datasets and a 1.66 point decrease in F1 score on SQuAD. Additionally, we show that the Bayesian model average of neural networks can further mitigate the impact of quantization noise, leading to more robust compressed models. Our method outperforms existing techniques in both compression efficiency and accuracy retention, offering a promising solution for compressing DNNs.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12278

Loading