ALS-LoRA: Improved Low-Rank Matrix Compesation Method for Low Bit Quantization

ALS-LoRA: Improved Low-Rank Matrix Compesation Method for Low Bit Quantization

ICLR 2026 Conference Submission22650 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LoRA, quantization, Large language models(LLMs), Alternating Least Squares (ALS)

TL;DR: This paper presents a novel approach that enhances quantization performance for Large Language Models (LLMs) by improving low-rank matrix modeling with activation values and Alternating Least Squares (ALS).

Abstract: The rapid advancement of Large Language Models (LLMs) has intensified the demand for efficient methodologies that balance model performance with hardware constraints, particularly GPU memory limitations. Quantization has emerged as a prominent technique for model compression, with QLoRA demonstrating the potential of low-rank matrices for quantization error compensation by integrating LoRA-based efficient fine-tuning. However, even LoRA fine-tuning requires substantial resources for models with tens or hundreds of billions of parameters. In this work, we explore low-rank matrix compensation for quantization errors without global LoRA fine-tuning, employing Alternating Least Squares (ALS) to better model and solve the optimization problem. We introduce a novel approach that refines low-rank matrix modeling by incorporating activation values and optimizing them directly through ALS, particularly under low-bit quantization conditions. Furthermore, we revisit the quantization interval partitioning in Round-to-Nearest (RTN) methods by introducing scaling factors that transform the discontinuous truncation function into a continuous optimization problem, thereby enhancing quantization performance through more rational interval adjustment. Extensive experimental evaluations support our theoretical contributions. Our research reveals how low-rank matrices can effectively capture the intrinsic information of large models, overcoming limitations of traditional SVD-based approaches. Comprehensive experiments across standard benchmarks consistently show that our method outperforms state-of-the-art quantization techniques, providing a principled, data-driven framework for understanding low-rank structure's role in quantization error compensation. This advancement represents a significant step toward practical LLM deployment, offering more efficient and effective model compression strategies.

Primary Area: optimization

Submission Number: 22650

Loading