Simplified Skip-Connected UNet for Robust Speaker Verification Under Noisy Environments

Zonghui Wang, Zhihua Fang, Zhida Song, Liang He

Published: 01 Jan 2024, Last Modified: 05 Jun 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, deep neural network based methods for speaker verification have made remarkable progress in clean environments. However, background noise significantly reduces the accuracy and reliability of speaker verification systems by masking or changing the voice characteristics of the speaker. In this paper, we propose a cascaded framework optimized with multiobjective loss to mitigate the interference of different levels and types of noise on the speaker verification task. The proposed architecture consists of two components: a speech enhancement module based on improved 2D-UNet, which reduces the structural limitations of directly using classical UNet for noise reduction, and a back-end speaker embedding extraction module. We carry our experiments on the VoxCeleb1 and VOiCES datasets, as well as in the presence of out-of-domain noise conditions. The evaluations have demonstrated this method shows great potential for speaker verification in noisy environments.