Keywords: fMRI, brain decoding, video reconstruction, cross-subject generalization, visual cortex, contrastive learning, zero-shot decoding
TL;DR: A cognitive process-inspired fMRI-to-video framework that hierarchically aligns brain features with CLIP representations and enables subject-agnostic applicability.
Abstract: Subject-agnostic brain decoding, which aims to reconstruct continuous visual experiences from fMRI without subject-specific training, holds great potential for clinical applications. However, this direction remains underexplored due to challenges in cross-subject generalization and the complex nature of brain signals.
In this work, we propose Visual Cortex Flow Architecture (VCFlow), a novel hierarchical decoding framework that explicitly models the ventral-dorsal architecture of the human visual system to learn multi-dimensional representations. By disentangling and leveraging features from early visual cortex, ventral, and dorsal streams, VCFlow captures diverse and complementary cognitive information essential for visual reconstruction.
Furthermore, we introduce a feature-level contrastive learning strategy to enhance the extraction of subject-invariant semantic representations, thereby enhancing subject-agnostic applicability to previously unseen subjects.
Unlike conventional pipelines that need more than 12 hours of per-subject data and heavy computation, VCFlow sacrifices only 7\% accuracy on average yet generates each reconstructed video in 10 seconds without any retraining, offering a fast and clinically scalable solution.
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 6157
Loading