CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

ACL ARR 2026 January Submission141 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Post-Training Quantization, Model Compression, CKA.

Abstract: Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)—a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMs—including LLaMA and Qwen—in terms of perplexity (PPL) and downstream task performance.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English, Code (Python)

Submission Number: 141

Loading