Differentiable Soft Min-Max Loss to Restrict Weight Range for Model Quantization

Published: 27 Jun 2024, Last Modified: 20 Aug 2024Differentiable Almost EverythingEveryoneRevisionsBibTeXCC BY 4.0
Keywords: soft min-max, quantization, weight clustering
Abstract: The range of weights in a model disrupts effective lower bit quantization. Penalizing the range of weights improve quantization accuracy, but calculation of range (max-min) is not differentiable. In this work, we propose Differentiable Soft Min-Max Loss (DSMM) to restrict weight ranges so that we can get a quantization-friendly model which has narrow weight ranges. We apply DSMM with a learnable parameter which can adjust hardness of DSMM without requiring a special hyper-parameter. DSMM improves lower bit quantization accuracy with state-of-the-art post-training quantization (PTQ), quantization-aware training (QAT), and weight clustering across various domains and model sizes.
Submission Number: 19
Loading