AAM-Sinc-MV1D: A Model of Short-term Speaker Recognition Based on The Raw Waveform

Published: 2024, Last Modified: 04 Nov 2025NaNA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Speaker recognition is a cutting-edge technology that focuses on identifying individuals based on their unique voice characteristics. To address the challenges associated with data collection, we have leveraged deep learning techniques to introduce two innovative and lightweight speaker recognition models: Sinc-MN1D and AAM-Sinc-MN1D. These models integrate the latest advancements in deep learning and speaker verification by utilizing a modified MobileNetV2 framework as the core module.To capture essential short-term speaker features effectively, we have meticulously replaced the initial convolutional layer of the backbone network with a positively modified convolutional layer inspired by the optimized SincNet. Furthermore, to enhance the extraction of critical frequency features, we have incorporated the AAM-softmax loss function, commonly used in face recognition, to enhance the model’s capability in identifying challenging samples. Our method has been rigorously evaluated on the TIMIT dataset, demonstrating superior performance compared to the baseline approach.
Loading