DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong; Wei Ping; Jiaji Huang; Kexin Zhao; Bryan Catanzaro

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 OralReaders: Everyone

Keywords: diffusion probabilistic models, audio synthesis, speech synthesis, generative models

Abstract: In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

One-sentence Summary: DiffWave is a versatile diffusion probabilistic model for waveform generation, which matches the state-of-the-art neural vocoder in terms of quality and can generate abundant realistic voices in time-domain without any conditional information.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Code: [![Papers with Code](/images/pwc_icon.svg) 11 community implementations](https://paperswithcode.com/paper/?openreview=a-xFK8Ymz5J)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/diffwave-a-versatile-diffusion-model-for/code)

Data: [LJSpeech](https://paperswithcode.com/dataset/ljspeech)

13 Replies

Loading