STELAR: Dual-space training EEG Foundation Models for Transferable Representations

18 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: EEG foundation model; self-supervised learning; dual-space pretraining; masked waveform reconstruction; token-level representation alignment; EMA/momentum encoder; spatio-temporal cross-attention; linear probing; transferability; cross-subject generalization
TL;DR: An EEG foundation model with lean and stable dual-space pretraining, yielding stronger linear probing and more transferable representations.
Abstract: Electroencephalography (EEG) is a non-invasive technique that provides critical insights for diagnosing neurological disorders. However, leveraging EEG in machine learning remains challenging due to its inherently low signal-to-noise ratio (SNR), pronounced inter-subject variability, and heterogeneous channel configurations across datasets. These issues make it challenging to design a general-purpose encoder that can reliably capture robust and transferable EEG representations. Most existing EEG foundation models adopt self-supervised learning frameworks, typically pairing a primary encoder with several auxiliary components. While these auxiliary modules are intended to support representation learning, in practice, they often dominate the optimization process, preventing the encoder from developing strong, generalizable features. Consequently, even well-trained models may fall short in downstream applications. To address this limitation, we propose STELAR, a novel EEG foundation model that concentrates training on the encoder while minimizing the role of auxiliary components. STELAR introduces a three-part dual-space pretraining strategy that integrates representation-space alignment with lightweight signal-space reconstruction: (i) visible-token alignment directly supervises encoder outputs, (ii) masked-token alignment enforces generative consistency through a compact prediction head, and (iii) linear masked reconstruction preserves fidelity to the original signals. This streamlined design substantially reduces auxiliary parameters while yielding a cleaner and more effective pretraining pipeline compared to prior approaches. In addition, STELAR incorporates a spatio-temporal cross-attention encoder, which jointly captures spatial dependencies across EEG channels and temporal dynamics across time. Empirical results demonstrate that STELAR converges rapidly, within 15 epochs, and consistently outperforms previous EEG foundation models by up to 5\% under linear probing evaluation. All source code will be released publicly upon acceptance of this work.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11219
Loading