Keywords: Outlier reduction, Post-training quantization, rotations
TL;DR: Learning rotations that provably improve quantization error.
Abstract: We introduce OptRot, a data-free preprocessing method to learn fusible rotations for post-training quantization of language models. OptRot reduces weight outliers by finding rotations which minimize the element-wise fourth power of the rotated weights. We show how reducing weight outliers can provably improve weight quantization performance and how OptRot rotations can outperform both Hadamard rotations and rotations learned by the data-dependent method SpinQuant.
Submission Number: 33
Loading