Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement
Abstract: Traditional speech enhancement systems produce speech with compromised quality. Here we propose to use the high quality speech
generation capability of neural vocoders for better quality speech enhancement. We term this parametric resynthesis (PR). In previous
work, we showed that PR systems generate high quality speech for
a single speaker using two neural vocoders, WaveNet and WaveGlow. Both these vocoders are traditionally speaker dependent. Here
we first show that when trained on data from enough speakers, these
vocoders can generate speech from unseen speakers, both male and
female, with similar quality as seen speakers in training. Next using these two vocoders and a new vocoder LPCNet, we evaluate the
noise reduction quality of PR on unseen speakers and show that objective signal and overall quality is higher than the state-of-the-art
speech enhancement systems Wave-U-Net, Wavenet-denoise, and
SEGAN. Moreover, in subjective quality, multiple-speaker PR outperforms the oracle Wiener mask.
0 Replies
Loading