Evaluating Deep Speaker Embedding Robustness to Domain, Sampling Rate, and Codec Variations

Published: 2025, Last Modified: 07 Jan 2026INTERSPEECH 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Speaker verification systems based on deep speaker embeddings perform well under matched training and evaluation conditions, but performance degrades under domain shifts. This work evaluates the sensitivity of four speaker embedding models—ECAPA-TDNN, TitaNet, ECAPA2, and ReDimNet—to variations in acoustic domains, sampling rates, and audio codecs. Experiments on datasets with far-field speech, noise, and music interference demonstrate that all models degrade under mismatched conditions. ReDimNet demonstrates the smallest performance degradation compared to the other models but is still affected in certain cases. Downsampling and low-bitrate compression further degrade performance, revealing the reliance of models on high-frequency information and their sensitivity to compression artifacts.
Loading