Abstract: Noise suppression systems generally produce output speech with
compromised quality. We propose to utilize the high quality speech
generation capability of neural vocoders for noise suppression. We
use a neural network to predict clean mel-spectrogram features from
noisy speech and then compare two neural vocoders, WaveNet and
WaveGlow, for synthesizing clean speech from the predicted mel
spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model
Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener
mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although
at the cost of much slower waveform generation.
0 Replies
Loading