Keywords: fMRI-to-Video Reconstruction, Multi-Shot Video Reconstruction, fMRI-to-Text Decoding
TL;DR: MindShot pioneers multi-shot fMRI video reconstruction by explicitly decoupling mixed signals into shot-specific segments and decoding semantic keyframe captions via LLMs, enabling accurate recovery of complex visual narratives.
Abstract: Reconstructing dynamic videos from fMRI is important for understanding visual cognition and enabling vivid brain-computer interfaces. However, current methods are critically limited to single-shot clips with video-level alignment and reconstruction, failing to address the multi-shot nature of real-world experiences. To bridge this gap, we propose MindShot, a novel shot-level framework that effectively reconstructs multi-shot videos from fMRI via a divide-and-decode strategy. Specifically, our framework consists of three stages: (1) Shot Decomposition: We first identify shot boundaries within fMRI, then decompose the mixed signals into distinct, shot-specific segments. (2) Keyframe Decoding: Each segment is decoded into a textual description representing the keyframe of its corresponding shot. (3) Video Reconstruction: The final video is generated from these keyframe captions, effectively mitigating noise from fMRI redundancy. Addressing the lack of data for multi-shot reconstruction, we construct a large-scale multi-shot fMRI-video dataset, synthesized from existing datasets. Experimental results demonstrate our framework outperforms state-of-the-art methods in both single-shot and multi-shot reconstruction fidelity. Ablation studies confirm the critical role of shot-level reconstruction in multi-shot video reconstruction, with decomposition significantly improving decoded caption CLIP similarity by 71.8%. This work establishes a new paradigm for multi-shot fMRI reconstruction, enabling accurate recovery of complex visual narratives through explicit decomposition and semantic prompting.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 18015
Loading