Improving Speaker Diarization Using Semantic Information: Joint Pairwise Constraints Propagation

Anonymous

Improving Speaker Diarization Using Semantic Information: Joint Pairwise Constraints Propagation

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Speaker diarization, an important task in speech processing, has been predominantly relied on acoustic signal analysis to differentiate speakers. This reliance on acoustic features often overlooks the wealth of semantic content within speech that can provide additional clues regarding speaker identities. Addressing this gap, our study introduces a semantically enriched diarization approach that extends beyond the acoustic domain, tapping into the nuances of linguistic content. We present a novel method that employs advanced language understanding to extract semantic cues, which are integral to discerning speaker contributions within conversations. Our approach utilizes these cues to formulate pairwise constraints, introducing a multi-modal clustering process to segment and classify speakers and their spoken contents. By integrating these semantically derived constraints into the diarization pipeline, we achieve substantial improvements in accuracy. Extensive evaluations on public dataset illustrate that our method consistently outstrips acoustic-only systems, offering a more holistic perspective on speaker diarization by fully embracing the semantic information of speech.

Paper Type: short

Research Area: Speech recognition, text-to-speech and spoken language understanding

Contribution Types: NLP engineering experiment

Languages Studied: English, Mandarin

0 Replies

Loading