['3c3', '< Abstract: Although existing fMRI-to-image reconstruction methods could predict highquality images, they do not explicitly consider the semantic gap between training and testing data, resulting in reconstruction with unstable and uncertain semantics. This paper addresses the problem of generalized fMRI-to-image reconstruction by explicitly alleviates the semantic gap. Specifically, we leverage the pre-trained CLIP model to map the training data to a compact feature representation, which essentially extends the sparse semantics of training data to dense ones, thus alleviating the semantic gap of the instances nearby known concepts (i.e., inside the training super-classes). Inspired by the robust low-level representation in fMRI data, which could help alleviate the semantic gap for instances that far from the known concepts (i.e., outside the training super-classes), we leverage structural information as a general cue to guide image reconstruction. Further, we quantify the semantic uncertainty based on probability density estimation and achieve Generalized fMRI-to-image reconstruction by adaptively integrating Expanded Semantics and Structural information (GESS) within a diffusion process. Experimental results demonstrate that the proposed GESS model outperforms state-ofthe-art methods, and we propose a generalized scenario split strategy to evaluate the advantage of GESS in closing the semantic gap. Our codes are available at https://github.com/duolala1/GESS.', '---', "> Abstract: Existing fMRI-to-image reconstruction methods often produce high-quality images but frequently overlook the critical semantic gap between training and testing data, leading to semantically inconsistent and ambiguous reconstructions. This paper explicitly addresses the challenge of generalized fMRI-to-image reconstruction by directly confronting this semantic disparity. We introduce a novel approach that leverages a pre-trained CLIP model to project training data into a compact, dense feature representation, thereby transforming sparse training semantics into dense ones, particularly for concepts within known super-classes. To address instances far from known concepts (i.e., outside the training super-classes), we leverage the robust low-level representations inherent in fMRI data, employing structural information as a general and transferable cue to guide image reconstruction. The proposed method quantifies semantic uncertainty using probability density estimation and achieves Generalized fMRI-to-Image Reconstruction through an Adaptive Integration of Expanded Semantics and Structural Information (GESS) within a diffusion process. Extensive experimental results demonstrate GESS's significant outperformance over state-of-the-art methods. Additionally, we introduce a novel generalized scenario split strategy to rigorously assess GESS's efficacy in bridging the semantic gap. Our code is publicly available at https://github.com/duolala1/GESS.", '6,14c6,20', '< Functional magnetic resonance imaging (fMRI) is a powerful tool for studying the human brain and visual system, as it provides a non-invasive way to measure neural activity. Image reconstruction from fMRI data is important for studying visual representation in the cortex and for developing the vivid "reading the mind" brain-computer interface (BCI) technology [10,19,11].', '< High-quality fMRI-to-image reconstruction is a typical cross-modality problem [14,28] and suffers from severe ill-posedness [2]. Existing state-of-the-art methods leverage the data-driven scheme to address such ill-posed problem by learning data prior from training data. However, training data are often collected as a limited number of instances [19], and real-world images are distributed in a wide, broad semantic space with a long-tail distribution [12]. This brings about the problem of the semantic gap, the semantics of testing instances may be unknown in training stage.', '< Addressing the semantic gap between the training data collected from the laboratory and the testing instances in the real world helps develop reconstructions for generalized fMRI-to-image scenarios and significantly promote the application of BCI. However, previous methods put too much attention on improving the image quality while less focus on the semantic accuracy of the reconstructed images, which brings two problems. Unstable semantics: The limited samples in each super-class fails to form a compact feature space, which may result in incorrect decision boundaries and inability to estimate robust semantics even located within the known concepts (Fig. 1c). This is what we called the inside-space gap (ISG). Uncertain semantics: The concepts covered by the training set is not enough, resulting in uncertainty in the prediction of test samples with unknown concepts (more like a zero-shot problem), which is called the outside-space gap (OSG). Traditional methods assume that the training set covers all the semantics in the test set (Fig. 1b), and ignore the semantic gap caused by unknown samples in reality.', '< To this end, this paper addresses the generalized fMRI-to-image reconstruction problem by explicitly alleviating the semantic gap. To deal with the instances within known semantic space (ISG problem), we map the fMRI signals to a compact semantic space via a pre-trained Contrastive Language-Image Pre-Training (CLIP) model [23]. To deal with the instances within unknown semantic space (OSG problem), we propose to use the structure information as a transferable cue to guide the reconstruction, which is inspired by the robust and redundant low-level representation in visual cortex [19]. As it is difficult to find a hard boundary to define ISG and OSG cases for a given instance, we quantify its semantic confidence by probability density estimation on the training semantics and adopt the likelihood as the contribution indicator. Finally, we achieve Generalized fMRI-to-image reconstruction by adaptively integrating Expanded Semantics and Structural information (GESS) in a diffusion process.', '< Our contributions in this paper could be summarized as:', '< • We explicitly address the generalized fMRI-to-image reconstruction problem and formulate its solution as alleviating the semantic gap within known and unknown semantic subspaces.', '< • We propose a CLIP based method to expand the fMRI features to a compact semantic space to alleviate the inside-space gap, and a structural information guided diffusion model to alleviate the outside-space gap.', '< • We construct a confidence indicator by quantifying the semantic similarity between a given instance and the training data, based on which we propose GESS to achieve generalized fMRI-to-image reconstruction by adaptively weighting the semantic and structural information.', "< • Our experimental results demonstrate that the proposed GESS model outperforms the classical and state-of-the-art methods. Additionally, we propose a dataset split method to construct a generalized fMRI-to-image scenario, which allows us to further evaluate the model's generalization ability.", '---', '> Functional magnetic resonance imaging (fMRI) is a powerful neuroimaging technique for studying the human brain and visual system, offering a non-invasive window into neural activity. High-quality image reconstruction from fMRI data is pivotal for advancing our understanding of visual representation in the cortex and for developing sophisticated "mind-reading" brain-computer interface (BCI) technologies [10,19,11].', '> ', '> However, fMRI-to-image reconstruction is inherently a challenging cross-modality problem [14,28] characterized by severe ill-posedness [2]. While state-of-the-art methods employ data-driven schemes to learn priors from training data, these datasets are often limited in size [19]. Crucially, real-world images occupy a vast and semantically diverse space, typically exhibiting a long-tail distribution [12]. This fundamental mismatch introduces a significant "semantic gap," where the semantics of testing instances may be entirely unknown during the training phase.', '> ', '> Addressing this semantic gap between laboratory-collected training data and real-world testing instances is essential for developing generalized fMRI-to-image reconstruction systems and significantly promoting BCI applications. Previous research has predominantly focused on enhancing image quality, often at the expense of semantic accuracy in the reconstructed images. This oversight leads to two critical problems:', '> *   **Unstable Semantics (Inside-Space Gap - ISG):** Limited samples within each super-class can prevent the formation of a compact feature space, leading to inaccurate decision boundaries and unreliable semantic estimations, even for concepts within the known training distribution (Fig. 1c).', '> *   **Uncertain Semantics (Outside-Space Gap - OSG):** The training set may not adequately cover the full spectrum of real-world concepts, resulting in high uncertainty in predictions for test samples with novel or unknown concepts (akin to a zero-shot problem). Traditional methods often assume that the training set encompasses all test set semantics (Fig. 1b), thereby neglecting the semantic gap posed by unseen samples.', '> ', '> To this end, this paper explicitly addresses the generalized fMRI-to-image reconstruction problem by directly alleviating the semantic gap. For instances within the known semantic space (ISG problem), we project fMRI signals into a dense and continuous semantic manifold using a pre-trained Contrastive Language-Image Pre-Training (CLIP) model [23]. For instances within the unknown semantic space (OSG problem), we propose to leverage structural information as a transferable cue to guide reconstruction, inspired by the robust and redundant low-level representations found in the visual cortex [19]. Recognizing the difficulty in defining a rigid boundary between ISG and OSG cases, we quantify semantic confidence through probability density estimation on the training semantics, using this likelihood as an adaptive contribution indicator. Ultimately, we achieve Generalized fMRI-to-image reconstruction by adaptively integrating Expanded Semantics and Structural information (GESS) within a diffusion process.', '> ', '> Our contributions are summarized as follows:', '> *   We explicitly define and address the generalized fMRI-to-image reconstruction problem, formulating its solution as alleviating the semantic gap across known and unknown semantic subspaces.', '> *   We propose a CLIP-based method to expand fMRI features into a compact semantic space, thereby alleviating the inside-space gap, and introduce a structural information-guided diffusion model to mitigate the outside-space gap.', '> *   We construct a novel confidence indicator by quantifying the semantic similarity between a given instance and the training data, enabling GESS to achieve generalized fMRI-to-image reconstruction through adaptive weighting of semantic and structural information.', "> *   Our extensive experimental results demonstrate that the proposed GESS model significantly outperforms classical and state-of-the-art methods. Furthermore, we introduce a novel dataset split strategy to construct a more rigorous generalized fMRI-to-image scenario, allowing for a robust evaluation of our model's generalization capabilities.", '217d222', '< ']
