Towards real-time audiovisual speaker localizationDownload PDFOpen Website

2011 (modified: 09 Nov 2022)EUSIPCO 2011Readers: Everyone
Abstract: There is a growing interest in multi-modal signal processing: sets of related signals are jointly processed to extract information that is otherwise hidden when considering the different modalities independently. One popular problem in cross-modal processing is the localization of visual sources synchronous with audio stimuli. Audiovisual source localization allows to pinpoint and extract salient audio-video information from a scene, enabling innovative applications in communication, interaction and gaming. In this paper we aim to achieve cross-modal localization in real-time using single camera, single microphone data. Existing works use complex statistical data models or complex representations of audio and video features, limiting their applicability in real-time systems. In this paper we propose a simple yet effective algorithm that allows to detect and localize in real-time synchronous audio-video sources. The proposed approach obtains the best speaker localization performances reported to date on the popular CUAVE database, while running in real-time and without requiring any training.
0 Replies

Loading