Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization

Published: 10 Oct 2024, Last Modified: 27 Oct 2024Audio Imagination: NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: speech, voice, synthesis, articulatory
Abstract: Articulatory synthesis seeks to replicate the human voice by modeling the physics of the vocal apparatus, offering interpretable and controllable speech production. However, such methods often require careful hand-tuning to invert acoustic signals to their articulatory parameters. We present VocalTrax, a method which performs this inversion automatically via optimizing an accelerated vocal tract model implementation. Experiments on diverse vocal datasets show significant improvements over existing methods in out-of-domain speech reconstruction, while also revealing persistent challenges in matching natural voice quality.
Submission Number: 42
Loading