MB2C: Multimodal Bidirectional Cycle Consistency for Learning Robust Visual Neural Representations

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Decoding human visual representations from brain activity data is a challenging but arguably essential task with an understanding of the real world and the human visual system. However, decoding semantically similar visual representations from brain recordings is difficult, especially for electroencephalography (EEG), which has excellent temporal resolution but suffers from spatial precision. Prevailing methods mainly focus on matching brain activity data with corresponding stimuli-responses using contrastive learning. They rely on massive and high-quality paired data and omit semantically aligned modalities distributed in distinct regions of the latent space. This paper proposes a novel Multimodal Bidirectional Cycle Consistency (MB2C) framework for learning robust visual neural representations. Specifically, we utilize dual-GAN to generate modality-related features and inversely translate back to the corresponding semantic latent space to close the modality gap and guarantee that embeddings from different modalities with similar semantics are in the same region of representation space. We perform zero-shot tasks on the ThingsEEG dataset and EEG classification and image reconstruction tasks on the EEGCVPR40 dataset, achieving state-of-the-art performance compared to other baselines.
Relevance To Conference: This work proposes a novel method, Multimodal Bidirectional Cycle Consistency (MB2C), for learning robust visual neural representations from EEG-based brain activity, significantly advancing the latest developments in neural decoding. Specifically, MB2C enforces generated features to approximate the distribution of realistic sample features by learning and leveraging consistency loss between synthesized representations and ground truth. We combine MB2C with contrastive learning to achieve cross-modal alignment between EEG and images. Subsequently, we perform zero-shot tasks on the ThingsEEG dataset and EEG classification and image reconstruction tasks on the EEGCVPR40 dataset, achieving state-of-the-art performance compared to other baselines. The results demonstrate that our method can effectively close the modality gap and guarantee that embeddings from different modalities with similar semantics are in the same region of representation space. Additionally, we conduct experiments demonstrating the feasibility of extracting natural image information from EEG signals. Finally, although this paper focuses on EEG and images, we show that MB2C can generalize to other pairs of modalities. Therefore, our contribution is highly relevant to the scope of the ACM MM conference, as it not only achieves decoding of human brain activity but also opens up new avenues for research and applications in multimodal learning.
Supplementary Material: zip
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Experience] Multimedia Applications
Submission Number: 3266
Loading