Understanding and Selecting Calibration Data for LLM Quantization: From Sensitivity Analysis to Activation-Based Curation

Maoxiong; Benben Hai; Anzheng Wang; Xiangdong Liu; Lingwei Meng; Tingting Wang; Shuli Zheng; Yixin Zhou; Haibiao Chen

Understanding and Selecting Calibration Data for LLM Quantization: From Sensitivity Analysis to Activation-Based Curation

Maoxiong, Benben Hai, Anzheng Wang, Xiangdong Liu, Lingwei Meng, Tingting Wang, Shuli Zheng, Yixin Zhou, Haibiao Chen

Published: 21 Jun 2026, Last Modified: 21 Jun 2026ACL-SELVA 2026 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Post-Training Quantization, Calibration Sensitivity Analysis, Calibration Data Selection

TL;DR: This paper analyzes calibration-data sensitivity in LLM post-training quantization and proposes ACDM, an activation-centroid-based method for improving quantized performance and cross-task balance.

Abstract: Post-training quantization (PTQ) of large language models (LLMs) is highly sensitive to calibration data, yet what drives this sensitivity remains poorly understood. We present a systematic study spanning three PTQ algorithms—GPTQ, AWQ, and SmoothQuant+GPTQ—and six models from the Qwen2.5 and Llama-3.1 families, progressing from sensitivity analysis to activation-based data curation. Our sensitivity analysis jointly sweeps sample count and sequence length, revealing that their relative importance varies substantially across algorithms and model families, and identifying a robust combination that approaches saturated performance across both. We then show that domain-matched calibration data helps mainly for GPTQ on Qwen2.5 specialized variants, but provides little consistent benefit for AWQ, suggesting that AWQ's sensitivity is governed more by activation-distribution mismatch than by surface domain mismatch. Motivated by this diagnosis, we introduce the Activation Centroid Distance Metric (ACDM), a curation method that selects calibration samples by aligning their per-layer activation statistics with task-reference centroids estimated from held-out training-side samples, directly targeting the quantity that domain labels fail to capture. ACDM improves both average accuracy and cross-task balance over three baselines—random sampling, self-calibration, and ZipCal—across five benchmarks, achieving gains of $+0.95$ and $+0.49$ percentage points over random sampling on Qwen2.5-7B-Instruct and Llama-3.1-8B-Instruct, respectively. We release an automated toolkit at https://anonymous.4open.science/r/adaptive-quant-toolkit-D43E.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 8

Loading