MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery

Reese Kneeland; Cesar Torrico; Tong Chen; Jordyn Antonio Ojeda; Shubh Khanna; Jonathan Xu; Paul Steven Scotti; Thomas Naselaris

MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery

Reese Kneeland, Cesar Torrico, Tong Chen, Jordyn Antonio Ojeda, Shubh Khanna, Jonathan Xu, Paul Steven Scotti, Thomas Naselaris

Published: 23 Sept 2025, Last Modified: 24 Nov 2025NeurIPS 2025 Workshop BrainBodyFMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: fMRI, mental imagery, decoding

TL;DR: MIRAGE is a method designed for cross-decoding mental images from fMRI data, achieving state-of-the-art results on the NSD-Imagery benchmark by leveraging informed architectural choices.

Abstract: To be useful for downstream applications, vision decoding models that are trained to reconstruct seen images from human brain activity must be able to generalize to internally generated visual representations, i.e., mental images. In an analysis of the recently released NSD-Imagery dataset, we demonstrated that while some modern vision decoders can perform quite well on mental image reconstruction, some fail, and that state-of-the-art (SOTA) performance on seen image reconstruction is no guarantee of SOTA performance on mental image reconstruction. Motivated by these findings, we developed MIRAGE, a method explicitly designed to train on vision datasets and cross-decode mental images from brain activity. MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model. Feature metrics and human raters establish MIRAGE as SOTA for mental image reconstruction on the NSD-Imagery benchmark. With ablation analysis we show that mental image reconstruction works best when decoders use image features with relatively few dimensions and include guidance from text-based and both high- and low-level image-based features. Our work indicates that--given the right architecture--existing large-scale datasets using external stimuli are viable training data for decoding mental images, and warrant optimism about the future success and utility of mental image reconstruction.

Submission Number: 9

Loading