Harnessing Pre-trained Language Models for EEG-based Epilepsy Detection

Published: 2025, Last Modified: 14 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Pre-trained large-scale models have brought about significant advancements in Natural Language Processing (NLP), inspiring their application to other domains, including physiological signals. However, the use of pre-trained models in the analysis of physiological signals, particularly electroencephalograms (EEG), remains limited. Specifically, the application of self-supervised pre-training methods to EEG signals faces three key challenges: (1) Pre-training self-supervised models demands substantial computational resources; (2) The low signal-to-noise ratio (SNR) of EEG signals can hinder the representation learning capabilities of these models; (3) There is a lack of high-quality training datasets. High-quality EEG data is not as readily available as natural language data, as EEG signals are typically collected from limited datasets and predominantly contain normal signals. To address these challenges, we propose PLM2EEG, a method that leverages pre-trained language models for EEG analysis tasks. The two core components of PLM2EEG are as follows: (1) EEG Tokenization, where fixed-length EEG segments from each channel are treated as tokens, encapsulating local feature information and aligning with the input dimensions of pre-trained language models. Channel and positional embeddings are added to each token to preserve the spatiotemporal integrity of the signals. (2) Frozen Pretrained Model, which uses a pre-trained large language model with its self-attention and feedforward layers retained. The model is then fine-tuned specifically for EEG-related tasks, showcasing its adaptability to this domain. Experimental results demonstrate that PLM2EEG significantly outperforms existing self-supervised pre-trained models on EEG tasks across two large datasets. It enhances cross-dataset learning and sets new benchmarks for EEG analysis.
Loading