From Noisy Neural Time Series to Structured Language: A Foundation Model for Imagined Speech Decoding from EEG signals

Published: 01 Mar 2026, Last Modified: 10 Apr 2026ICLR 2026 TSALM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Presentation Attendance: No, we cannot present in-person
Keywords: Neural Time Series, JEPA, Representation Learning, Cross-Modal Alignment, EEG, Imagined Speech Synthesis
TL;DR: A JEPA based model to synthesize intelligible imagined speech from non-invasive EEG signals
Abstract: Communicative brain–computer interfaces (BCIs) offer a promising pathway for restoring communication in patients affected by conditions such as amyotrophic lateral sclerosis (ALS). Among these paradigms, imagined speech decoding from non-invasive EEG is particularly attractive due to its portability and scalability. However, EEG constitutes a highly noisy neural time series, characterized by low signal-to-noise ratios, substantial inter-subject variability, and susceptibility to artifacts. Moreover, imagined speech arises from purely internal cognitive processes, producing weak and spatially diffused neural activity. Extracting structured semantic information from such signals remains a significant challenge. To address this challenge, we present NeuroSpeak, a JEPA-based framework for sentence-level imagined speech generation from non-invasive EEG. Our approach combines masked neural signal modeling with vector-quantized latent discretization to learn robust EEG representations, which are aligned with language embeddings using a predictive alignment objective and decoded into natural language via a pretrained sequence model. We train and evaluate our model on the large-scale CHISCO corpus comprising over 20,000 imagined speech sentences under a subject-agnostic evaluation setting. The proposed framework achieves a semantic similarity score of 47.70\% relative to ground-truth text, demonstrating generalization beyond subject-specific neural patterns. To the best of our knowledge, this represents the largest and most semantically diverse study of sentence-level imagined speech generation using non-invasive EEG.
Track: Research Track (max 4 pages)
Submission Number: 67
Loading