VIBE: Vision transformer based experts network for SSVEP decoding

ICLR 2026 Conference Submission22574 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: EEG, SSVEP, ViT, BCI, MoE
TL;DR: VIBE leverages a ViT based model to extract EEG features, achieving state-of-the-art SSVEP classification performance on large datasets.
Abstract: Steady-state visual evoked potential based brain–computer interfaces (SSVEP-BCIs) have attracted wide attention for their high information transfer rate (ITR) and non-invasiveness. However, existing deep learning methods for SSVEP-BCI decoding have reached a performance bottleneck, as they struggle to fully extract the complex neural signal features required for robust performance. Motivated by advances in vision and time series modeling, here we present a \textbf{VI}sion Transformer \textbf{B}ased \textbf{E}xpert network (VIBE), a multistage deep learning framework for SSVEP classification. VIBE integrates a Vision Transformer (ViT) module to generate rich spatiotemporal representations with data and network enhancement modules in a decoder for frequency recognition. We evaluate VIBE on two large benchmark datasets, including the Benchmark and the BETA dataset spanning 105 subjects. Notably, with just 0.4 seconds of stimulation, our VIBE achieves an ITR of $263.8$ bits per minute (bpm) and $202.7$ bpm on the Benchmark and BETA datasets, respectively. Experimental results demonstrate that VIBE consistently outperforms state-of-the-art baselines in offline experiments, highlighting its effectiveness as a high-performance decoding strategy for SSVEP-BCIs.
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 22574
Loading