Keywords: neuroAI, speech decoding, neuroscience, transformer, seq2seq, MAE, scaling, attention
TL;DR: We present a multitask seq2seq Transformer with a day-adaptive Neural Hammer & Scalpel that decodes open-vocabulary text from intracortical signals, sets a new phoneme benchmark, and shows interpretable attention and favorable scaling
Abstract: We present a transformer-based sequence-to-sequence model for human speech decoding from intracortical neural recordings. Unlike prior framewise recurrent approaches trained with connectionist temporal classification, our approach jointly models neural and linguistic dynamics and generates open-vocabulary word sequences directly from the neural signal. To address the limited-data regime of human brain–computer interface datasets, we adopt a multitask framework that combines phoneme and word decoding with auxiliary supervision from Mel-frequency cepstral coefficients, and we introduce Neural Hammer \& Scalpel day-specific transformation to mitigate cross-day nonstationarity. The model establishes a new benchmark in phoneme decoding on the Willett et al. dataset and improves over previous end-to-end systems in word decoding. Attention visualizations reveal interpretable temporal chunking aligned with speech segments, shedding light on emergent neural dynamics. Finally, a scaling analysis shows favorable power-law trends, suggesting that continued data growth could yield substantial gains and positioning transformers as strong candidates for future brain-to-text
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 11433
Loading