Online Neural Speaker Diarization with Core Samples

Yanyan Yue, Jun Du, Maokui He

Published: 2022, Last Modified: 13 Nov 2024CCBR 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose an online neural diarization method based on TS-VAD, which shows remarkable performance on highly overlapping speech. We introduce online VBx to help TS-VAD get the target-speaker embeddings. First, when the amount of data is insufficient, only online VBx is executed to accumulate speaker information. Afterwards, a separate offline subsystem is utilized to extract i-vectors based on core samples for TS-VAD online decoding. Finally, we devise a speaker selection strategy that allows TS-VAD to handle an unknown number of speakers. We evaluate our system on AliMeeting dataset. The experimental results demonstrate that our online method can effectively handle high-degree overlapped audios.