QUANTIZATION AWARE FACTORIZATION FOR DEEP NEURAL NETWORK COMPRESSIONDownload PDF

22 Sept 2022 (modified: 14 Oct 2024)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: tensor methods, compression, quantization
TL;DR: We propose a novel approach to neural network compression that performs tensor factorization and quantization simultaneously.
Abstract: Tensor approximation of convolutional and fully-connected weights is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. Therefore, our goal is to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for approximating a float tensor by a tensor of Canonical Polyadic format (CP), whose factors are close to their quantized versions. This leads to lower approximation error after quantization and smaller quality drop in model predictions while maintaining the same compression rate.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/quantization-aware-factorization-for-deep/code)
9 Replies

Loading