Progressive Upsampling Audio Synthesis via Effective Adversarial Training

Youngwoo Cho; Minwook Chang; Gerard Jounghyun Kim; Jaegul Choo

Progressive Upsampling Audio Synthesis via Effective Adversarial Training

Youngwoo Cho, Minwook Chang, Gerard Jounghyun Kim, Jaegul Choo

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We proposed a novel raw-waveform generation method, which uses only 5% of the parameters of the existing model while maintaining the output quality.

Abstract: This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. PUGAN leverages on the recently proposed idea of progressive generation of higher-resolution images by stacking multiple encode-decoder architectures. To effectively apply it to raw audio generation, we propose two novel modules: (1) a neural upsampling layer and (2) a sinc convolutional layer. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them in a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 20x smaller for 44.1kHz output, than an existing technique called WaveGAN. Our experiments show that the audio signals can be generated in real-time with the comparable quality to that of WaveGAN with respect to the inception scores and the human evaluation.

Keywords: audio synthesis, sound effect generation, generative adversarial network, progressive training, raw-waveform

Original Pdf: pdf

8 Replies

Loading