QUANTIZATION AWARE FACTORIZATION FOR DEEP NEURAL NETWORK COMPRESSION

Daria Cherniuk; Stanislav Abukhovich; ANH-HUY PHAN; Ivan Oseledets; Andrzej Cichocki; Julia Gusak

QUANTIZATION AWARE FACTORIZATION FOR DEEP NEURAL NETWORK COMPRESSION

Daria Cherniuk, Stanislav Abukhovich, ANH-HUY PHAN, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

22 Sept 2022 (modified: 22 Jun 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: tensor methods, compression, quantization

TL;DR: We propose a novel approach to neural network compression that performs tensor factorization and quantization simultaneously.

Abstract: Tensor approximation of convolutional and fully-connected weights is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. Therefore, our goal is to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for approximating a float tensor by a tensor of Canonical Polyadic format (CP), whose factors are close to their quantized versions. This leads to lower approximation error after quantization and smaller quality drop in model predictions while maintaining the same compression rate.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/quantization-aware-factorization-for-deep/code)

9 Replies

Loading