Keywords: SVD, Sensitivity list, Layer-wise
TL;DR: Fine-grained sensitivity estimation improves rank allocation for SVD-based LLM compression.
Abstract: Large language models achieve strong performance across many tasks, but their increasing model size makes deployment on resource-constrained devices challenging. Low-rank decomposition provides an effective compression approach, yet existing SVD-based methods often rely on uniform or coarse-grained rank allocation, which may overlook fine-grained variations in layer sensitivity. In this work, we propose a fine-grained sensitivity-based framework for SVD-based LLM compression that profiles the relationship between rank and perplexity at a higher resolution and uses the resulting fine-grained sensitivity scores to guide adaptive rank allocation. Our method assigns non-uniform ranks across layers under a target compression ratio, prioritizing compression decisions that induce smaller performance degradation per parameter reduction. Experiments on LLaMA-2-7B show that our approach achieves better compression-performance trade-offs compared to prior SVD-based compression methods.
Submission Number: 54
Loading