Don’t Forget the Context: A Multitask Transformer for Intracortical Speech Decoding

Michał Olak; Tommaso Boccato; Matteo Ferrante

Don’t Forget the Context: A Multitask Transformer for Intracortical Speech Decoding

Michał Olak, Tommaso Boccato, Matteo Ferrante

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: neuroAI, speech decoding, neuroscience, transformer, seq2seq, MAE, scaling, attention

TL;DR: We present a multitask seq2seq Transformer with a day-adaptive Neural Hammer & Scalpel that decodes open-vocabulary text from intracortical signals, sets a new phoneme benchmark, and shows interpretable attention and favorable scaling

Abstract: We present a transformer-based sequence-to-sequence model for human speech decoding from intracortical neural recordings. Unlike prior framewise recurrent approaches trained with connectionist temporal classification, our approach jointly models neural and linguistic dynamics and generates open-vocabulary word sequences directly from the neural signal. To address the limited-data regime of human brain–computer interface datasets, we adopt a multitask framework that combines phoneme and word decoding with auxiliary supervision from Mel-frequency cepstral coefficients, and we introduce Neural Hammer \& Scalpel day-specific transformation to mitigate cross-day nonstationarity. The model establishes a new benchmark in phoneme decoding on the Willett et al. dataset and improves over previous end-to-end systems in word decoding. Attention visualizations reveal interpretable temporal chunking aligned with speech segments, shedding light on emergent neural dynamics. Finally, a scaling analysis shows favorable power-law trends, suggesting that continued data growth could yield substantial gains and positioning transformers as strong candidates for future brain-to-text

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Submission Number: 11433

Loading