Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor EstimationDownload PDFOpen Website

Published: 01 Jan 2021, Last Modified: 15 May 2023IEEE Signal Process. Lett. 2021Readers: Everyone
Abstract: Neither a monocular RGB camera nor a small-size microphone array is capable of accurate three-dimensional (3D) speaker localization. By taking advantage of accurate visual object detection, and audio-visual complementary sensor fusion, we formulate the three-dimensional (3D) speaker localization problem as a visual scaling factor estimation problem. As a result, we effectively reduce the traditional audio-only 3D speaker localization from an exhaustive grid search to a one-dimensional (1D) optimization problem. We propose a multi-modal perception system with two optimization approaches. We show that the proposed methods are effective, accurate, and robust against interference and, as corroborated by indicative empirical results on real dataset, competitive to the conventional uni-modal and the state-of-the-art audio-visual speaker localization approaches.
0 Replies

Loading