Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization

Dongwen Ying, Ruohua Zhou, Junfeng Li, Yonghong Yan

Published: 2017, Last Modified: 15 May 2025IEEE ACM Trans. Audio Speech Lang. Process. 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Signal subspace has been widely exploited to localize multiple speech sources. However, most signal subspace methods cannot count the number of sources, and do not make use of speech sparsity in the frequency domain. This paper presents a grid search window-dominant signal subspace (GS-WDSS) method and a closed-form WDSS (CF-WDSS) method to localize short-term speech sources. Such methods are based upon the generalized sparsity assumption that each window containing some time-adjacent bins is dominated by one source, as opposed to the conventional assumption that each individual bin is dominated by one source. The generalized assumption enables the principal eigenvector of the spatial correlation matrix on each window to span the signal subspace of the window-dominant source. The direction-of-arrival (DOA) of the dominant source is estimated from the principal eigenvector. The DOAs and the number of sources are eventually summarized from the DOA histogram of all dominant sources. The conventional assumption is a special case of the generalized assumption. By using the generalized assumption, the performance in estimating DOAs of the window-dominant sources is significantly improved at the cost of acceptable masking effect. The superiority of the proposed methods is verified by simulated and real experiments.