How to make someone speak a language that they don't know.

Anonymous

Oct 30, 2018 NIPS 2018 Workshop IRASL Blind Submission readers: everyone
  • Abstract: We present a simple idea that allows to record a speaker in a given language and synthesize their voice in other languages that they may not even know. These techniques open a wide range of potential applications such as cross-language communication, language learning or automatic video dubbing. We call this general problem multi-language speaker-conditioned speech synthesis and we present a simple but strong baseline for it. Our model architecture is similar to the encoder-decoder Char2Wav model or Tacotron. The main difference is that, instead of conditioning on characters or phonemes that are specific to a given language, we condition on a shared phonetic representation that is universal to all languages. This cross-language phonetic representation of text allows to synthesize speech in any language while preserving the vocal characteristics of the original speaker. Furthermore, we show that fine-tuning the weights of our model allows us to extend our results to speakers outside of the training dataset.
  • Keywords: Speech synthesis, Voice cloning, TTS
  • TL;DR: We present a simple idea that allows to record a speaker in a given language and synthesize their voice in other languages that they may not even know.
0 Replies

Loading