Generative Multimodal Decoding: Reconstructing Images and Text from Human fMRI

Published: 27 Oct 2023, Last Modified: 09 Nov 2023DGM4H NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: fMRI decoding, generative models, multimodal, neural activity
TL;DR: We propose a neuroscience-inspired framework for multimodal decoding that aligns fMRI, text, and image representations to reconstruct semantic captions and photorealistic images directly from brain activity
Abstract: The human brain adeptly processes immense visual information using complex neural mechanisms. Recent advances in functional MRI (fMRI) enable decoding this visual information from recorded brain activity patterns. In this work, we present an innovative approach for reconstructing meaningful images and captions directly from fMRI data, with a focus on brain captioning due to its enhanced flexibility over image decoding. We utilize the Natural Scenes fMRI dataset containing brain recordings from subjects viewing images. Our method leverages state-of-the-art image captioning and diffusion models for multimodal decoding. We train regression models between fMRI data and textual/visual features and incorporate depth estimation to guide image reconstruction. Our key innovation is a multimodal framework aligning neural and deep learning representations to generate both semantic captions and photorealistic images from brain activity. We demonstrate quantitative improvements in captioning over prior art and in image spatial relationships through our reconstruction pipeline. In conclusion, this work significantly advances brain decoding capabilities through an integrated vision-language approach. Our flexible decoding platform combining high-level semantic text and low-level visual depth information provides new insights into human visual cognition. The proposed methods could enable future applications in brain-computer interfaces, neuroscience, and AI.
Submission Number: 3