Seeing Speech: Magnetic Resonance Imaging-Based Vocal Tract Deformation Visualization Using Cross-Modal Transformer
Abstract: As an essential component to advance speech science, understanding of speech production can be greatly helpful to improve our understanding of motor control, dynamical systems of humans during natural speech. Different medical imaging modalities have been leveraged to visualize the dynamic process, in which Magnetic resonance imaging (MRI) provides a valuable tool for evaluating static postures. In this demo, we present our solution to visualize the vocal tract deformation, leveraging the correlation between the MRI and the acoustical signals. We first formulate the problem as a cross-modal prediction task and a novel cross-modal Transformer network is proposed. Thus, we can infer the deformation of the vocal tract by only utilizing the acoustical signals. Then, we present an interactive framework, which can be used to visualize the deformation utilizing the aforementioned network. We hope our solution can also be helpful in pronunciation training for children with sound speech disorders and second language learning.
0 Replies
Loading