everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Electroencephalography (EEG) can enable non-invasive, real-time measurement of brain activity in response to human language processing. Previously released EEG datasets focus on brain signals measured either during completely natural reading or in full psycholinguistic experimental settings. Since reading is commonly performed when considering certain content as more semantically relevant than other, we release a novel dataset for semantic text relevance containing $23{,}270$ time-locked (${\sim}0.7s$) word-level EEG recordings acquired from participants who read both text that was semantically relevant and irrelevant to self-selected topics. Using these data, we present benchmark experiments with two evaluation protocols: participant-independent and participant-dependent on two prediction tasks (word relevance and sentence relevance). We report the performance of five well known models on these tasks. Our dataset and code are openly released. Altogether, our dataset paves the way for advancing research on language relevance and psycholinguistics, brain input and feedback-based recommendation and retrieval systems, and development of brain-computer interface (BCI) devices for online detection of language relevance.