A Multi-Decomposition Method for Compressing Larger AI Models Based on Reinforcement Learning

Jian Tang; Kun Xie; Jigang Wen; Xiaocan Li; Jiazheng Tian; Gaogang Xie; Kenli Li

A Multi-Decomposition Method for Compressing Larger AI Models Based on Reinforcement Learning

Jian Tang, Kun Xie, Jigang Wen, Xiaocan Li, Jiazheng Tian, Gaogang Xie, Kenli Li

26 Sept 2024 (modified: 27 Oct 2024)ICLR 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep neural network, low rank decomposition, multiple decomposition methods, reinforcement learning.

TL;DR: This paper proposes an innovative model compression scheme that combines various decomposition methods with a reinforcement learning-based optimization framework, jointly optimizing model compression rate and accuracy.

Abstract: With the development of modern deep neural network (DNN), the scale of parameters is increasing, making it difficult to deploy models for use on resource-constrained edge devices. To address this issue, model compression is necessary, and using low-rank matrix decomposition to compress DNN models is an effective research approach. However, traditional studies on low-rank decomposition compression typically apply a single matrix decomposition method to each parameter matrix in the neural network, without considering the structural characteristics of each layer in AI models, thus failing to achieve the optimal compression effect. Therefore, this paper proposes, for the first time, a scheme for model compression using multiple decomposition methods, selecting the most suitable decomposition method for each layer in the model. However, to truly implement this approach, it is essential to balance model accuracy and compression cost. To address this, we propose a joint optimization paradigm that simultaneously optimizes model accuracy and compression rate. We also introduce a framework LMFBRL based on reinforcement learning that jointly selects the optimal decomposition method and rank. Tests were conducted on five models such as LeNet-300, ResNet-20, and Vgg-16. Compared to singly using the MF method for compressing the LeNet300 model, our approach has shown an improvement of 3.6% in compression rate and a 1.8% increase in accuracy. The test results validate the effectiveness of the algorithm proposed in this paper.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6465

Loading