Mitigating Error Propagation in Low-Rank Approximation of Large Models via Distribution-Aware Whitening
Keywords: Low-Rank Approximation, Large Models, Post-training Compression
TL;DR: We propose a distribution-aware low-rank approximation framework that mitigates error propagation by dynamically whitening feature inputs, achieving more stable and effective model compression.
Abstract: Low-rank approximation has emerged as a cornerstone technique for model compression and parameter-efficient fine-tuning, enabling substantial reductions in computation and memory without altering model architectures. However, existing approaches often overlook the shifts in feature distributions induced by the approximation process, which can lead to error amplification and unstable inference.
We propose a distribution-aware whitening framework that dynamically whitens layer inputs based on the evolving feature distributions, ensuring second-order isotropy of input features.
This allows that the discarded components in the low-rank approximation are those with minimal impact on model outputs, thereby minimizing cumulative approximation errors across layers.
We theoretically analyze how distribution misalignment leads to error propagation and demonstrate that our approach achieves tighter control over layerwise distortion.
Extensive experiments across various large language models demonstrate the superiority of our method in post-training compression. Moreover, our method also serves as an effective initialization for LoRA-style parameter-efficient fine-tuning.
Our findings highlight the importance of considering feature distributions in low-rank approximations, paving the way for reliable and effective model compression strategies.
Supplementary Material: pdf
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2980
Loading