MelGAN: Generative Adversarial Networks for Conditional Waveform SynthesisDownload PDF

Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: Previous works (Donahue et al., 2018a; Engel et al., 2019) have found that generat-ing coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score) suggest that our model is state-of-the-art for mel-spectrogram inversion. We show qualitative results on speech synthesis, music domain translation and unconditional music synthesis, to establish the generality of the proposed techniques. We also evaluate different components of the model, proposing a set of guidelines for designing general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters as compared to competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than realtime on CPU, without any hardware specific optimization tricks.
Code Link: https://github.com/descriptinc/melgan-neurips
CMT Num: 8485
2 Replies

Loading