FLORA: A Unified Generalist Model for Visual Brain Decoding via Multimodal Neural Embeddings

Dongyang Li, Haoyang Qin, Mingyang Wu, Chen Wei, Quanying Liu

Published: 31 Jan 2025, Last Modified: 04 Feb 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Decoding visual information from neural data using artificial intelligence enhances our understanding of the human visual system. However, simultaneously acquiring paired neural data across modalities is challenging, leading most existing approaches to process these signals independently. This neglects their complementary characteristics and hinders decoding performance. In this study, we introduce FLORA, an end-to-end generalist model designed to integrate cross-modal neural data—including EEG, MEG, and fMRI—to construct a unified neural representation. FLORA employs multimodal large language models (MLLMs) alongside multimodal adapters and specialized diffusion model decoders, achieving superior performance on downstream tasks (e.g., neural signal retrieval and visual stimulus reconstruction) compared to single-modal approaches. By leveraging high-performance models, FLORA minimizes the number of parameters in the alignment and fusion layers, ensuring cost-effective fine-tuning. This design facilitates efficient training and seamless integration of extra modalities and datasets. Our approach holds promise for advancing our understanding of the brain's visual mechanisms and fostering new insights within the cognitive science and brain-computer interface communities. Our code is available at https://anonymous.4open.science/r/FLORA-2C4A.