Improving Speaker Diarization Using Semantic Information: Joint Pairwise Constraints PropagationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Speaker diarization, an important task in speech processing, has been predominantly relied on acoustic signal analysis to differentiate speakers. This reliance on acoustic features often overlooks the wealth of semantic content within speech that can provide additional clues regarding speaker identities. Addressing this gap, our study introduces a semantically enriched diarization approach that extends beyond the acoustic domain, tapping into the nuances of linguistic content. We present a novel method that employs advanced language understanding to extract semantic cues, which are integral to discerning speaker contributions within conversations. Our approach utilizes these cues to formulate pairwise constraints, introducing a multi-modal clustering process to segment and classify speakers and their spoken contents. By integrating these semantically derived constraints into the diarization pipeline, we achieve substantial improvements in accuracy. Extensive evaluations on public dataset illustrate that our method consistently outstrips acoustic-only systems, offering a more holistic perspective on speaker diarization by fully embracing the semantic information of speech.
Paper Type: short
Research Area: Speech recognition, text-to-speech and spoken language understanding
Contribution Types: NLP engineering experiment
Languages Studied: English, Mandarin
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview