WaveFluid: A New Adversarial Approach for Efficient High-Fidelity Speech Synthesis

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Probability Flow Models, Generative Adversarial Networks, Adversarial Training, Speech Synthesis, Mel-Spectrogram
TL;DR: An efficient high-fidelity speech synthesis model based on new adversarial training approaches.
Abstract: Probability flow based models for image and audio synthesis, such as denoising diffusion probabilistic models and poisson flow generative models, can be interpreted as modeling the ground truth distribution through a non-compressive passive fluid partial differential equation(PDE), where the initial fluid density equals to ground truth distribution and the final fluid density equals to the chosen prior distribution. In this research, we have improved the architectural designs of neural networks and propose WaveFluid model for speech synthesis task with mel-spectrogram condition, which learns a velocity field directly through adversarial training instead of estimating the solution to a chosen linear PDE like diffusion or poisson equation in previous works. And since mel-spectrogram is a strong condition and limits the possible audios to a small range, we divide our model into two stages and use reparameterization techniques in order to reduce memory footprint and improve training efficiency. Experimental results show that our model is more competitive compared with previous vocoders in sample quality within 10 inference steps.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7572
Loading