TL;DR: We introduce a differentiable simulator for human speech production.
Abstract: Speech is a complex sensorimotor process that requires the coordination of hundreds of muscles, yet is something we humans can do almost automatically. Although computational models of how the vocal tract maps to speech have been studied for almost a century, the inverse process---reconstructing the dynamic vocal tract shape from its radiated speech---remains a challenge. Prior works which attempt to model the inverse process with acoustic simulation rely on sampling, which is inefficient due to the high dimensionality of the search space. To overcome this problem, we propose to model speech production with a differentiable simulator, which directly couples the vocal tract geometry and its acoustic output with analytic gradients, thereby efficiently modeling both the forward and inverse processes. Experiments show that our simulator can reconstruct human-like speech, enabling potential applications in studying infant language acquisition, psycholinguistics, and sensorimotor control.
Length: short paper (up to 4 pages)
Domain: methods
Author List Check: The author list is correctly ordered and I understand that additions and removals will not be allowed after the abstract submission deadline.
Anonymization Check: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and URLs that point to identifying information.
Submission Number: 75
Loading