Data-Free Transformer Quantization Using Parameter-Space Symmetry

Published: 09 Jun 2025, Last Modified: 09 Jun 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantization, Deep Learning
Abstract: Transformer models have seen widespread use in many learning tasks but incur large memory and compute costs, limiting their deployability. Post-Training Quantization (PTQ) is a promising solution but can lead to significant performance degradation. Many PTQ methods estimate weight and activation distributions with calibration data to account for outliers and maintain quantized performance. We propose a data-free approach to improve quantization by exploiting parameter space symmetries. We address outliers and high variability in weights by finding a transformation of the model weights that minimizes quantization error variance. Our approach is light-weight, data-free, and can be integrated as a pre-processing step within other PTQ methods. We evaluate our approach by testing quantized large language models on several benchmark tasks.
Student Paper: Yes
Submission Number: 57
Loading