Multi-Speaker DOA Tracking Algorithm Utilizing Probability Hypothesis Density Filter and Weighted Histogram of SRP-PHAT

Published: 01 Jan 2024, Last Modified: 12 May 2025IWAENC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This contribution presents a concurrent speakers’ direction of arrival (DOA) tracking algorithm in reverberant environments. The algorithm is formulated in two stages, leveraging speech sparsity in the short-time Fourier transform (STFT) domain. In the first stage, sets of DOAs per batch of time frames are computed. Initially, a single narrow-band (NB) DOA per time-frequency (TF) bin is selected using the W-disjoint orthogonality property of speech. The NB DOA is obtained as the maximum of the steered response power phase transform (SRP-PHAT) localization spectrum at that TF bin, together with a quality measure describing the confidence in the estimation. A localization spectrum is obtained by combining the NB DOAs using a weighted histogram, with the quality measures serving as weights. The set of DOAs is determined by identifying peaks in the resulting localization spectrum. The collection of DOAs is modeled as a random finite set (RFS). In the second stage, the probability hypothesis density (PHD) filter is applied to estimate and track the speakers’ DOAs over a collection of batches. Information from the first stage is utilized to calculate prior knowledge on the appearance of new speakers. Our experimental study demonstrates the superiority of the proposed algorithm over a baseline approach.
Loading