Squashed Weight Distribution for Low Bit Quantization of Deep Models

Nikko Strom, Haidar Khan, Wael Hamza

Published: 2022, Last Modified: 29 Sept 2023INTERSPEECH 2022Readers: Everyone

Abstract: Inference with large deep learning models in resource-constrained settings is increasingly a bottleneck in real-world applications of state-of-the-art AI. Here we address this by low-precision weight quantization. We achieve very low accuracy degradation by re-parametrizing the weights in a way that leaves the weight distribution approximately uniform. We show lower bit-width quantization and less accuracy degradation than previously reported in experiments on GLUE benchmarks (3-bit, 0.2% rel. degradation), and on internal intent/slot-filling datasets (2-bit, 0.4% rel. degradation).

0 Replies