Quantized sparse PCA for neural network weight compression

Andrey Kuzmin; Mart Van Baalen; Markus Nagel; Arash Behboodi

Quantized sparse PCA for neural network weight compression

Andrey Kuzmin, Mart Van Baalen, Markus Nagel, Arash Behboodi

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Model Compression, neural network quantization, sparse principal component analysis, vector quantization

Abstract: In this paper, we introduce a novel method of weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weight tensors. The underlying matrix factorization problem can be considered as a quantized sparse PCA problem and solved through iterative projected gradient descent methods. Seen as a unification of weight SVD, vector quantization and sparse PCA, our method achieves or is on par with state-of-the-art trade-offs between accuracy and model size. Our method is applicable to both moderate compression regime, unlike vector quantization, and extreme compression regime.

One-sentence Summary: A neural network weight compression method based on sparse quantized PCA.

14 Replies

Loading