Abstract: In this article, we present a high-bandwidth *egocentric* neuromuscular speech interface for translating silently voiced speech articulations into text. Specifically, we collect electromyographic (EMG) signals from multiple articulatory sites on the face and neck as individuals articulate speech in an alaryngeal manner to perform EMG-to-language translation. Such an interface is useful for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular disease, stroke, or trauma-induced damage (e.g., from radiotherapy toxicity) to the speech articulators. Previous works have focused on training text or speech synthesis models by mapping EMG collected during *audible* speech articulation to corresponding time-aligned audio, or by transferring time-aligned audio targets from EMG collected during *audible* articulation to EMG collected during *silent* articulation. However, such paradigms are not suitable for individuals who have already lost the ability to *audibly* articulate speech. Here, we present an alignment-free EMG-to-language conversion approach using only EMG collected during *silently* articulated speech. Our method is trained on a large, general-domain English language corpus and is released in an open-sourced manner.
Loading