SLIM: Structure-aware Low-rank Inference Model

SLIM: Structure-aware Low-rank Inference Model

ICLR 2026 Conference Submission21471 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Post-training Model Compression

Abstract: This paper introduces a new method for the low-rank compression of large language models. Existing techniques typically compress the weights individually, overlooking the internal dependencies within a transformer block. To address this limitation, we formulate a joint optimization problem to find the optimal low-rank weights for an entire transformer block, thereby minimizing the output reconstruction error. Our formulation allows the incorporation of key architectural elements, including residual connections and normalizations. We then introduce SLIM, an efficient algorithm to solve this optimization problem. Experimental results demonstrate that our method consistently achieves task accuracy improvements of over 5\% compared to existing techniques across a range of compression ratios and model families.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21471

Loading