Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants

Published: 23 Sept 2025, Last Modified: 09 Oct 2025NeurIPS 2025 Workshop BrainBodyFMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: fMRI, natural speech decoding, deep neural networks, contrastive learning, LLM embeddings, inter-subject variability, deep phenotyping, brain computer interface
TL;DR: We use deep learning to decode natural speech from fMRI; individual data quantity helps, but inter-subject variability hurts multi-subject training.
Abstract: We present a systematic investigation into decoding perceived natural speech from fMRI data in a participant-limited setting. Using a publicly available dataset of eight participants (LeBel et al., 2023), we demonstrate that deep neural networks trained with a contrastive objective can effectively decode unseen natural speech by retrieving perceived sentences from fMRI activity. We found that decoding performance directly correlates with the amount of training data available per participant. In this data regime, multi-subject training does not improve decoding accuracy compared to the single-subject approach. Additionally, training on similar or different stimuli across subjects has a negligible effect on decoding accuracy. Finally, we find that our decoders model both syntactic and semantic features, and that stories containing sentences with complex syntax or rich semantic content are more challenging to decode. While our results demonstrate the benefits of having extensive data per participant (deep phenotyping), they suggest that leveraging multi-subject data for natural speech decoding likely requires deeper phenotyping or a substantially larger cohort.
Submission Number: 34
Loading