Keywords: fMRI-to-Image Reconstruction, Coarse-to-Fine Generation, Scale-wise Autoregressive Modeling, Scale-aware Neural Guidance
TL;DR: MindHier, a coarse-to-fine autoregressive framework, uses scale-aware guidance to inject hierarchical neural features for fMRI-to-image reconstruction, surpassing diffusion models in speed, stability, and semantic accuracy.
Abstract: Reconstructing visual stimuli from fMRI signals is a central challenge bridging machine learning and neuroscience. Recent diffusion-based methods typically map fMRI activity to a single neural embedding, using it as static guidance throughout the entire generation process. However, this fixed guidance collapses hierarchical neural information and is misaligned with the stage-dependent demands of image reconstruction. In response, we propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling. MindHier introduces three components: a Hierarchical fMRI Encoder to extract multi-level neural embeddings, a Hierarchy-to-Hierarchy Alignment scheme to enforce layer-wise correspondence with CLIP features, and a Scale-Aware Coarse-to-Fine Neural Guidance strategy to inject these embeddings into autoregression at matching scales. These designs make MindHier an efficient and cognitively aligned alternative to diffusion-based methods by enabling a hierarchical reconstruction process that synthesizes global semantics before refining local details, akin to human visual perception. Extensive experiments on the NSD dataset show that MindHier achieves superior semantic fidelity, 4.67$\times$ faster inference, and more deterministic results than the diffusion-based baselines.
Supplementary Material: zip
Primary Area: applications to neuroscience & cognitive science
Submission Number: 8887
Loading