VCSE: Time-Domain Visual-Contextual Speaker Extraction NetworkDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 05 Nov 2023INTERSPEECH 2022Readers: Everyone
Abstract: Speaker extraction seeks to extract the target speech in a multi-talker scenario given an auxiliary reference. Such reference can be auditory, i.e., a pre-recorded speech, visual, i.e., lip movements, or contextual, i.e., phonetic sequence.
0 Replies

Loading