Keywords: LLM, model compression, matrix decomposition
TL;DR: A framework that expands matrix decomposition for LLM compression beyond SVD
Abstract: Large Language Models (LLMs) have significantly advanced AI with their exceptional performance across a wide range of tasks. However, their extensive computational requirements restrict their use on devices with limited resources.
While recent compression methods based on low-rank matrices show potential
solutions, they often suffer from significant loss of accuracy or introduce substantial
overhead in parameters and inference time. In this paper, we introduce Modular De-
composition (MoDeGPT), a new, efficient, and structured compression framework
that overcomes these limitations. MoDeGPT jointly decomposes pairs of consecu-
tive subcomponents within Transformer blocks, reduces hidden dimensions through
output reconstruction on a larger structural scale than conventional low-rank meth-
ods, and repurposes three classical matrix decomposition algorithms—Nyström
approximation, CR decomposition, and SVD—to ensure bounded errors in our
novel decomposition approach. Our experiments show that MoDeGPT, without
relying on backward propagation, consistently matches or surpasses the performance of prior techniques that depend on gradient information, while achieving a
98% reduction in compute costs when compressing a 13B-parameter model. On
LLaMA-2/3 and OPT models, MoDeGPT retains 90-95% of zero-shot performance
with compression rates of 25-30%. The compression process can be completed on
a single GPU in a few hours, boosting inference throughput by up to 46%.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9191
Loading