AA-SVD: Anchored and Adaptive SVD for Large Model Compression

ICLR 2026 Conference Submission24835 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Compression, SVD, Efficient LLM, Low-rank decomposition
Abstract: Pretrained large-language and vision-language models have demonstrated remarkable capabilities over the years, but their ever-increasing size poses challenges for deployment and accessibility. Model compression offers a path toward democratizing access, yet many existing approaches either require costly retraining or result in substantial performance degradation. To address this, we introduce a fast SVD-based truncation framework for compressing pretrained networks that enables rapid compression of billion-parameter models without retraining. Unlike existing SVD-based approaches that optimize only on the original inputs — ignoring distribution shifts from upstream compression and thus propagating errors forward—or those that rely only on shifted inputs and risk drifting away from the original outputs, our approach accounts for both. By anchoring each compressed layer to the original outputs while explicitly modeling input distribution shifts, our method identifies optimal low-rank approximations that maintain functional equivalence with the uncompressed network, thereby preserving the behavior of the full model. Experiments across language and vision-language models of varying scales demonstrate that our method not only achieves favorable trade-offs between compression ratio and task accuracy, but also outperforms existing baselines particularly at low compression ratios—where the gap widens as compression becomes more aggressive—offering a practical solution for efficient, large-scale model deployment.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24835
Loading