Supplementary Material: zip
Track: Extended Abstract Track
Keywords: brain decoding, fMRI decoding, video, vision, neural data
TL;DR: We proposed a retrieval based method to decode videos from human brain activity, using a multi-stream (semantic, visual and audio) modelling approach
Abstract: In this study, we present a novel multi-stream sensory approach for decoding video stimuli from human fMRI data. Leveraging a dataset of 1,000 short video clips and associated fMRI data, we explore the integration of visual, textual, and audio modalities to enhance the accuracy of brain decoding models. We develop subject-specific encoding models that predict brain activity from modality-specific embeddings and apply functional alignment across subjects to improve model generalization. Our decoding framework employs Ridge regression within identified regions of interest ) for each modality, followed by a retrieval process based on Euclidean search. The results demonstrate that integrating multiple sensory streams significantly enhances the performance of decoding models, with the combined Video+Text+Audio modality achieving the highest identification and retrieval accuracy.
Submission Number: 13
Loading