emg2speech: synthesizing speech from electromyography using self-supervised speech models

ACL ARR 2026 January Submission2536 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neuromotor speech interfaces, electromyography (EMG), speech neuroprosthesis, accessible speech technology, brain-computer interfaces
Abstract: We present a neuromuscular speech interface that translates electromyographic (EMG) signals recorded from orofacial muscles during speech articulation directly into audio. We find that self-supervised speech representations (SS) are strongly linearly related to the electrical power of muscle activity: a simple linear mapping predicts EMG power from SS with a correlation of *r* = 0.85. In addition, EMG power vectors associated with distinct articulatory gestures form structured, separable clusters. Together, these observations suggest that SS implicitly encode articulatory mechanisms, as reflected in EMG activity. Leveraging this structure, we map EMG signals into the SS space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory modeling or vocoder training. We demonstrate this system with a participant with amyotrophic lateral sclerosis (ALS), converting orofacial EMG recorded while she *silently* articulated speech into audio.
Paper Type: Long
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: spoken language translation, corpus creation, benchmarking, language resources
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English. EMG biosignal to speech conversion.
Submission Number: 2536
Loading