BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation
Abstract: Current EEG/MEG-to-text decoding systems suffer from three key limitations:
(1) reliance on teacher-forcing methods, which compromises robustness during inference,
(2) sensitivity to session-specific noise, hindering generalization across subjects, and
(3) misalignment between brain signals and linguistic representations due to pre-trained language model over-dominance.
To overcome these challenges, we propose BrainECHO ($\textbf{B}$rain signal decoding via v$\textbf{E}$ctor-quantized spe$\textbf{C}$trogram reconstruction for W$\textbf{H}$isper-enhanced text generati$\textbf{O}$n), a multi-stage framework that employs decoupled representation learning to achieve state-of-the-art performance on both EEG and MEG datasets. Specifically, BrainECHO consists of three stages:
(1) Discrete autoencoding, which transforms continuous Mel spectrograms into a finite set of high-quality discrete representations for subsequent stages.
(2) Frozen alignment, where brain signal embeddings are mapped to corresponding Mel spectrogram embeddings in a frozen latent space, effectively filtering session-specific noise through vector-quantized reconstruction, yielding a 3.65% improvement in BLEU-4 score.
(3) Constrained decoding fine-tuning, which leverages the pre-trained Whisper model for audio-to-text translation, balancing signal adaptation with knowledge preservation, and achieving 74%-89% decoding BLEU scores without excessive reliance on teacher forcing.
BrainECHO demonstrates robustness across sentence, session, and subject-independent conditions, passing Gaussian noise tests and showcasing its potential for enhancing language-based brain-computer interfaces.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: human-AI interaction
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 8175
Loading