- Keywords: Multichannel Speech Enhancement, Self-supervised, Tensor Decomposition, Sparse
- TL;DR: This paper proposes a self-supervised method to discover higher-order structures in multichannel speech data without any labels.
- Abstract: Tensor-based speech/audio representations have been successfully applied in multichannel speech enhancement (MSE). In the literature, some researchers represent multichannel speech waveforms as 3-dimensional tensors and design unsupervised methods to discover the inherent correlation structure in observed multichannel data. These algorithms generally adopt an alternating least square approach which may need several iterations to converge. In this paper, we turn to the tensor decomposition theory and propose a selfsupervised method to learn sparse tensor representations for MSE. Specifically, we attempt to obtain factoring matrices that adaptively transform the input noisy tensor into an approximately sparse core tensor. MSE is achieved by manipulating coefficients in the core tensor according to their amplitude. Simulations show the proposed algorithm can significantly reduce spatially white noise and cause little speech distortion.
- Double Submission: No