Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction

Yoonjun Cho; Dongjae Jeon; Soeun Kim; Albert No

Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction

Yoonjun Cho, Dongjae Jeon, Soeun Kim, Albert No

Published: 10 Jun 2025, Last Modified: 01 Jul 2025TTODLer-FM @ ICML 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient model architectures

TL;DR: We propose SRR, a quantization method that preserves dominant subspaces before quantizing residuals. It naturally generalizes to activation-aware settings and improves PTQ and QPEFT performance under low-bit constraints.

Abstract: Post-training quantization (PTQ) enables efficient deployment of LLMs by converting weights to low-bit formats, but often degrades accuracy. Quantization error reconstruction (QER) mitigates this by adding a low-rank correction term. However, existing QER methods typically quantize weights before identifying low-rank structure, discarding information they later attempt to recover. We propose Structured Residual Reconstruction (SRR), a simple yet effective reformulation of QER that first preserves dominant spectral directions and quantizes only the residual tail. The final approximation combines the preserved low-rank structure with a quantized residual, yielding improved fidelity under the same rank constraint. SRR generalizes to activation-aware settings by selecting dominant components based on contributions in both the original and activation-weighted spaces. We also apply SRR in QPEFT by freezing the preserved subspace and updating only the residual component during fine-tuning, which stabilizes training and leads to better adaptation. Across both PTQ and QPEFT, SRR consistently improves performance under fixed rank constraints, providing an effective framework for quantization-aware compression.

Submission Number: 45

Loading