WaveFlow: A Compact Flow-based Model for Raw Audio

Wei Ping; Kainan Peng; Kexin Zhao; Zhao Song

WaveFlow: A Compact Flow-based Model for Raw Audio

Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: flow-based models, raw audio, waveforms, speech synthesis, generative models

Abstract: In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without complicated density distillation and auxiliary losses as used in Parallel WaveNet. It provides a unified view of flow-based models for raw audio, including autoregressive flow (e.g., WaveNet) and bipartite flow (e.g., WaveGlow) as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech and obtain comparable likelihood as WaveNet, while only requiring a few sequential steps to generate very long waveforms. In particular, our small-footprint WaveFlow has only 5.91M parameters and can generate 22.05kHz speech 15.39 times faster than real-time on a GPU without customized inference kernels.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/waveflow-a-compact-flow-based-model-for-raw/code)

Original Pdf: pdf

13 Replies

Loading