Joint Separation and Tracking of Moving Sources with Distributed Microphone Arrays Based on Time-Varying Inertial Spatial Models

Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii

Published: 2025, Last Modified: 15 May 2026APSIPA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper describes the first attempt at separation and tracking (3 D localization) of multiple moving sound sources using multiple microphone arrays fixed at known locations in an indoor environment. As for static sources, location-dependent priors have been incorporated on the time-invariant spatial covariance matrices (SCMs) of sources in the statistical framework of blind source separation based on multichannel nonnegative matrix factorization (MNMF), achieving the maximum likelihood estimation of source locations. One may thus make both the SCMs and their priors vary over time to deal with source movements. This naive extension, however, fails to localize sources when the sources are inactive, yielding non-smooth, non-continuous trajectory estimates. To solve this problem, we formulate a hierarchical probabilistic model for multichannel mixture signals that consists of inertial Markov models for source locations, location-aware moving-average models for source SCMs, and NMF-based lowrank models for the power spectral densities (PSDs) of sources. All the time-varying attributes of sources are jointly estimated under a maximum-a-posteriori (MAP) principle, and the source images are then estimated with a multichannel Wiener filter. The experiment using simulated data with two moving sources and four four-channel arrays showed that the proposed method achieved better separation and smoother localization.

External IDs:dblp:conf/apsipa/NiheiBNCUIY25