Abstract: Time-Frequency (T-F) domain masking is currently the dominant method for single-channel speech enhancement, while little attention has been paid to phase information. A speech enhancement method, named PHASEN-SS, is proposed in this paper. Our method is divided into two steps, first a deep neural network (DNN) with two-branch communication using a combination of mask and phase for speech enhancement, and then a data post-processing after the DNN processes the noisy speech. PHASEN-SS uses two branches to predict the amplitude mask and the phase separately, which improves the accuracy of prediction by exchanging information between two branches, and then further the enhancement by denoising the residual noise through spectral subtraction. The experiments are conducted on the publicly available Voice Bank + DEMAND dataset, as well as a noisy speech dataset is synthesized with 4 common noises in Noise92 and Voice Bank clean speech according to the specified signal-to-noise ratio (SNR). The results show that the proposed method improves on the original one, and has better robustness to speech containing babble noise at higher SNRs for different SNRs.
0 Replies
Loading