On the Quantization of Neural Models for Speaker Verification

Vishal Kumar, Vinayak Abrol, Mathew Magimai-Doss

Published: 01 Jan 2024, Last Modified: 10 Feb 2025IEEE ACM Trans. Audio Speech Lang. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper addresses the sub-optimality of current post-training quantization (PTQ) and quantization-aware training (QAT) methods for state-of-the-art speaker verification (SV) models featuring intricate architectural elements such as channel aggregation and squeeze excitation modules. To address these limitations, we propose 1) a data-independent PTQ technique employing iterative low-precision calibration on pre-trained models; and 2) a data-dependent QAT method designed to reduce the performance gap between full-precision and integer models. Our QAT involves two progressive stages where FP-32 weights are initially transformed into FP-8, adapting precision based on the gradient norm, followed by the learning of quantizer parameters (scale and zero-point) for INT8 conversion. Experimental validation underscores the ingenuity of our method in model quantization, demonstrating reduced floating-point operations and INT8 inference time, all while maintaining performance on par with full-precision models.