Dynamic group sparsity for non-negative matrix factorization with application to unsupervised source separation

Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan

Published: 2016, Last Modified: 15 May 2025IWAENC 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Non-negative matrix factorization (NMF) is an appealing technique for audio source separation. Sparsity constraints are commonly used on the NMF model to discover a small number of dominant patterns. Recently, group sparsity has been proposed for NMF based methods, in which basis vectors belonging to a same group are permitted to activate together, while activations across groups are suppressed. However, most group sparsity functions activate the groups in a global manner without considering the dynamics of the speech spectra in different frames. In this paper, we propose dynamic group sparsity to model both the spectral dynamics and the temporal continuity of the speech signal and investigate its potential benefit to separate speech from other sound sources. Experimental results show that the proposed dynamic group sparsity improves the performance over regular group sparsity in unsupervised settings where neither the speaker identity nor the type of noise is known in advance.