An EEG Dataset of Word-level Brain Responses for Semantic Text Relevance

Published: 12 Jul 2025, Last Modified: 05 Sept 2025SIGIR '25: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information RetrievalEveryoneCC BY-SA 4.0
Abstract: Electroencephalography (EEG) can enable non-invasive, real-time measurement of brain activity reflecting cognitive processes during human language processing. Previously released EEG datasets primarily capture brain signals recorded either during natural reading or within controlled psycholinguistic experimental settings. Given that information retrieval research depends on understanding and modelling relevance, we present a novel dataset including EEG data recorded while participants read text that is semantically relevant or irrelevant to self-selected topics. The dataset contains 23, 270 time-locked (∼ 0.7s) word-level EEG recordings. Using these data, we conduct benchmark experiments with two evaluation protocols, cross-subject and within-subject, focusing on two prediction tasks: word relevance and sentence relevance. We report the performance of five well known models on these tasks. Altogether, our dataset paves the way for advancing research on language relevance, brain input and feedback-based recommendation and retrieval systems, and development of brain-computer interface (BCI) devices for online detection of language relevance. Our dataset and code are openly released at https://osf.io/xh3g5/wiki/home/ and at HuggingFace https://huggingface.co/datasets/Quoron/EEG-semantic-text-relevance.
Loading