Compressed neural architecture utilizing dimensionality reduction and quantization

Published: 01 Jan 2023, Last Modified: 19 May 2025Appl. Intell. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning has become the default solution for a plethora of problems nowadays. However, one drawback of such deep learning-based solutions is that the models are very large and cumbersome to process. As such, they are difficult to use in small or embedded devices and to transmit across the web. In light of this problem, this paper presents a novel method for converting large neural networks into lightweight, compressed models. Our method utilizes the dimensionality reduction algorithm known as Principal Component Analysis to decompose the network weights into smaller matrices to create a new, compressed architecture. This compressed model is further trained to overcome the error due to the lossy compression and then the parameters are finally stored after quantization. Experiments on benchmark datasets using standard models show that we achieve high compression, with compression rates between 5 to 35 depending on the complexity of the model, with little to no fall in model accuracy. Comparison with other state-of-the-art methods shows that the performance of our compression method is similar or even better in certain cases. This is the first work where dimensionality reduction and quantization are combined to create a new, compressed model.
Loading