Synthesizing Audio with GANs

Chris Donahue, Julian McAuley, Miller Puckette

Feb 12, 2018 (modified: Feb 12, 2018) ICLR 2018 Workshop Submission readers: everyone
  • Abstract: While Generative Adversarial Networks (GANs) have seen wide success at the problem of synthesizing realistic images, they have seen little application to audio generation. In this paper, we introduce WaveGAN, a first attempt at applying GANs to raw audio synthesis in an unsupervised setting. Our experiments on speech demonstrate that WaveGAN can produce intelligible words from a small vocabulary of human speech, as well as synthesize audio from other domains such as bird vocalizations, drums, and piano. Qualitatively, we find that human judges prefer the generated examples from WaveGAN over those from a method which na├»vely applies GANs on image-like audio feature representations.
  • TL;DR: Applying GANs to raw audio generation on several sound domains (speech, bird vocalizations, drums, piano)
  • Keywords: audio, GAN, adversarial