Dynamically locating multiple speakers based on the time-frequency domain

Hodaya Hammer; Shlomo Chazan; Jacob Goldberger; Sharon Gannot

Dynamically locating multiple speakers based on the time-frequency domain

Hodaya Hammer, Shlomo Chazan, Jacob Goldberger, Sharon Gannot

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: speaker localisation, microphone array, U-Net

Abstract: In this study we present a deep neural network-based online multi-speaker localisation algorithm based on a multi-microphone array. A fully convolutional network is trained with instantaneous spatial features to estimate the direction of arrival for each time-frequency bin. The high resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios, demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms.

One-sentence Summary: A multi-speaker localisation algorithm in the time-frequency domain using a multi-microphone array with SOTA results.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=oZCvSa8Qe

5 Replies

Loading