Abstract:We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocoder refers to a conditional extension of SampleRNN which generates raw waveform samples from intermediate representations. Unlike traditional models for speech synthesis, Char2Wav learns to produce audio directly from text.
TL;DR:Unlike traditional models for speech synthesis, Char2Wav learns to produce audio directly from text.
Keywords:Speech, Deep learning, Applications
Conflicts:umontreal.ca, inrs.ca, iitk.ac.in
Enter your feedback below and we'll get back to you as soon as possible.