STCON NIST SRE24 System: Composite Speaker Recognition Solution for Challenging Scenarios

Stepan Malykh, Alexander Anikin, Nikita Khmelev, Anastasia Korenevskaya, Anastasia Zorkina, Sergey Novoselov, Vladislav Marchevskiy, Vladimir Volokhov, Andrey Shulipa, Alexander Kozlov, Alexander Melnikov, Vasiliy Galyuk, Timur Pekhovskiy

Published: 2025, Last Modified: 21 Jan 2026INTERSPEECH 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper addresses the real-world challenges of voice biometrics, specifically those highlighted by the National Institute of Standards and Technology Speaker Recognition (SR) Evaluation 2024 challenge (NIST SRE24). We present multi-module SR systems integrating speaker diarization, robust embedding extraction, and adaptive scoring to handle multi-speaker recordings, variable speech segment durations, and cross-channel/cross-lingual speaker recognition scenarios. We propose methodology for training robust speaker embedding extractors and explore a range of state-of-the-art SR neural network architectures. We focus on the combined optimization of traditionally separate components – diarization and speaker recognition – and present practical observations regarding this integrated approach. The presented findings allowed our systems to secure top leaderboard positions in the NIST SRE24.

External IDs:dblp:conf/interspeech/MalykhAKKZNMVSK25