Efficient Low-rank and Sparse Approximation and Adaptation for Large Language Models

Efficient Low-rank and Sparse Approximation and Adaptation for Large Language Models

ICLR 2026 Conference Submission18533 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, model compression, efficient fine-tuning

TL;DR: We approximate model weight as the sum of a low-rank matrix and a sparse matrix and efficiently fine-tune the compressed model.

Abstract: Large Language Models (LLMs) have recently emerged as a significant advancement in natural language processing; however, their large scale and computational complexity make deployment a challenge. Model pruning has emerged as a post-training strategy to reduce LLMs' memory and computation needs. Despite notable progress, these techniques show a reduction in performance and necessitate post-pruning for recovery. To address these problems, we introduce $\textbf{ELSA}$, a novel method combining pruning and low-rank decomposition for better compression and recovery. We first use an alternating projections method to decompose the weight matrices into sparse matrices and low-rank matrices, which is validated from both theoretical and empirical perspectives; then we freeze the sparse matrices and update the low-rank matrices to efficiently recover the performance. To demonstrate the effectiveness and efficiency of the method, we conduct experiments on various language tasks (seven zero-shot tasks and language modeling) and models from different families (LLaMA, OPT, and Qwen) and different scales. The experiments show that the method outperforms state-of-the-art pruning methods and has comparable inference efficiency.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18533

Loading