SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions

SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions

ACL ARR 2025 May Submission6780 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Speaker verification (SV) models are increasingly integrated into security, personalization, and access control systems, yet their robustness to many real-world challenges remains inadequately benchmarked. Real-world systems can face diverse conditions, some naturally occurring, and others that may be purposely, or even maliciously created, which introduce mismatches between enrollment and test data, affecting their performance. Ideally, the effect of all of these on model performance must be benchmarked; however existing benchmarks fall short, generally evaluating only a subset of potential conditions, and missing others entirely. We introduce SVeritas, the Speaker Verification tasks benchmark suite, which evaluates the performance of speaker verification systems under an extensive variety of stressors, including ``natural'' variations such as duration, spontaneity and content of the recordings, background conditions such as noise, microphone distance, reverberation, and channel mismatches, recording condition influences such as audio bandwidth and the effect of various codecs, physical influences, such as the age and health conditions of the speaker, as well as the suspectibility of the models to spoofing and adversarial attacks. While several benchmarks do exist that each cover some of these issues, SVeritas is the first comprehensive evaluation that not only includes all of these, but also several other entirely new, but nonetheless important real-life conditions that have not previously been benchmarked. We use SVeritas to evaluate several state-of-the-art SV models and observe that while some architectures maintain stability under common distortions, they suffer substantial performance degradation in scenarios involving cross-language trials, age mismatches, and codec-induced compression. Extending our analysis across demographic subgroups, we further identify disparities in robustness across age groups, gender, and linguistic backgrounds. By standardizing evaluation under realistic and synthetic stress conditions, SVeritas enables precise diagnosis of model weaknesses and establishes a foundation for advancing equitable and reliable speaker verification systems.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: speaker verification, robustness benchmarking, evaluation, codec, channel mismatch, adversarial attacks

Contribution Types: Model analysis & interpretability

Languages Studied: English, and over 100 languages were benchmarked

Submission Number: 6780

Loading