Accurate and Efficient Singular Value Decomposition For LLMs via Decay-aware Rank Allocation and Feature-Preserved Weight Update

ICLR 2026 Conference Submission11170 Authors

18 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Singular Value Decomposition, Decay-Aware Rank Allocation, Feature Preserved Parameter Updating, Large Language Models
TL;DR: Accurate and Efficient Singular Value Decomposition for LLMs via Decay-Aware Rank Allocation and Feature-Preserved Weight Update
Abstract: Singular Value Decomposition (SVD) provides a hardware-agnostic and effective paradigm for compressing and accelerating Large Language Models (LLMs) by decomposing and truncating weight matrices, followed by weight updates to restore accuracy. However, SVD-based compression faces two major challenges:**(1) Rank Selection Problem:** Optimizing truncation and update ranks constitutes a high-dimensional combinatorial problem. Existing solutions rely on computationally expensive search, leading to both suboptimal performance and diminished efficiency. **(2) Limited Accuracy Restoration:** The sequential weight update strategy employed by state-of-the-art approaches (e.g., SVD-LLM) results in Hessian anisotropic, which hampers accuracy recovery and slows convergence. To overcome these, we introduce DF-SVD, which integrates: **(1) Decay-Aware Rank Allocation:** We derive and validate a correlation between decay characteristics of each weight's singular value spectrum and its importance. This enables dynamic, layer- and weight-specific rank allocation, ensuring high fidelity without costly search. **(2) Feature-Preserved Weight Update:** We introduce a theoretically grounded update strategy that fixes the truncated weight matrix $V^{\top}S^{-1}$ along with the principal components of $U\Sigma$, while updating only the minor components. This design ensures Hessian isotropic, achieving superior accuracy restoration and faster convergence. DF-SVD not only significantly outperforms baselines in accuracy, but also completing compression in just 30 minutes, achieving speedups of $7\times$, $11\times$ and $16\times$ compared to SVD-LLM, ASVD and Dobi-SVD respectively. DF-SVD directly correlates the singular spectrum with training-free rank selection and boosts Hessian isotropy, paving the way for a new paradigm in accurate and efficient SVD-based LLM compression.
Primary Area: generative models
Submission Number: 11170
Loading