TL;DR: We present a non invasive electromyogram speech neuroprostheses that can enable natural communication for individuals with clinical etiologies that affect voicing and articulator movement.
Abstract: In this article, we present a high-bandwidth *egocentric* neuromuscular speech interface for translating silently voiced speech articulations into text and audio. Specifically, we collect electromyogram (EMG) signals from multiple articulatory sites on the face and neck as individuals articulate speech in an alaryngeal manner to perform EMG-to-text or EMG-to-audio translation. Such an interface is useful for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular disease, stroke, or trauma-induced damage (e.g., radiotherapy toxicity) to speech articulators. Previous works have focused on training text or speech synthesis models using EMG collected during *audible* speech articulations or by transferring audio targets from EMG collected during *audible* articulation to EMG collected during *silent* articulation. However, such paradigms are not suited for individuals who have already lost the ability to *audibly* articulate speech. We are the first to present an alignment-free EMG-to-text and EMG-to-audio conversion using only EMG collected during *silently* articulated speech in an open-sourced manner. On a limited vocabulary corpora, our approach achieves almost $2.4\times$ improvement in word error rate with a model that is $25\times$ smaller by leveraging the inherent geometry of EMG.
Primary Area: Applications->Health / Medicine
Keywords: Speech neuroprostheses, electromyogram signals, SPD matrices, Riemannian geometry
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Flagged For Ethics Review: true
Submission Number: 3261
Loading