Lossless, Fine-Tuning-Free Low-Rank Factorization Algorithms for Weight Compression

Boyang Zhang, Daning Cheng

Published: 01 Oct 2024, Last Modified: 01 Oct 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Low-rank factorization techniques are a popular technique for model compression. The optimization objective of these methods is to minimize the squared error in approximating the original matrix, and then rely on fine-tuning to avoid loss rise. However, from the optimization objective, the optimization of the approximated low-rank matrix is inconsistent with the optimization of the model performance. And the fine-tuning process cannot be analyzed uniformly. This directly leads to model performance degradation when not fine-tuned. We analyze this currently unexplored problem and propose for the first time a lossless low-rank weight factorization strategy without fine-tuning. First, we analyze the correlation between low-rank factorization and the model optimization objective via mathematical calculus, and experimentally establish the perturbation range of the matrix factorization error regarding the model performance. We redefine it as a numerical rank-defect problem under inequality constraints and propose a new goal that comprehensively considers matrix factorization error and model performance. To solve this problem, we propose two optimization algorithms, lossless and compact optimization algorithms under numerical rank-defect. The lossless optimization algorithm aims to greedily optimize the model loss function while ensuring model compression. The compact optimization algorithm aims to optimize greedily the model size while keeping the model lossless. Our algorithm corrects the goal of low-rank factorization optimization during inference, and can directly compress the model of a specific task without fine-tuning to obtain a lossless model. The effectiveness of our method is validated on a wide range of vision, language tasks and datasets.