Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with EEG-Vision Transformer

Ali Akbari; Kosar Sanjar Arani; Tony Yousefnezhad; Maryam Mirian; Emad Arasteh

Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with EEG-Vision Transformer

Ali Akbari, Kosar Sanjar Arani, Tony Yousefnezhad, Maryam Mirian, Emad Arasteh

Published: 10 Oct 2024, Last Modified: 03 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0

Supplementary Material: zip

Track: Proceedings Track

Keywords: Image reconstruction, Joint learning, Hierarchichal, Vision transformer, Brain decoding, GAN, BCI

TL;DR: Hierarchical-ViT is a model that reconstructs visual images from EEG data using hierarchical visual features, Vision transformers, and CLIP-based learning, achieving superior image quality on benchmark datasets.

Abstract: Reconstructing visual stimuli from brain activity is a challenging problem, particularly when using EEG data, which is more affordable and accessible than fMRI but noisier and lower in spatial resolution. In this paper, we present Hierarchical-ViT, a novel framework designed to improve the quality and precision of EEG-based image reconstruction by integrating hierarchical visual feature extraction, vision transformer-based EEG (EEG-ViT) processing, and CLIP-based joint learning. Inspired by the hierarchical nature of the human visual system, our model progressively captures complex visual features—such as edges, textures, and shapes—through a multi-stage processing approach. These features are aligned with EEG signals processed by the EEG-ViT model, allowing for the creation of a shared latent space that enhances contrastive learning. A StyleGAN is then employed to generate high-resolution images from these aligned representations. We evaluated our method on two benchmark datasets, EEGCVPR40 and ThoughtViz, achieving superior results compared to existing approaches in terms of Inception Score (IS), Kernel Inception Distance (KID), and Fréchet Inception Distance (FID) for EEGCVPR, and IS and KID for the ThoughtViz dataset. Through an ablation study, we underscored the feasibility of hierarchical feature extraction, while multivariate analysis of variance (MANOVA) test confirmed the distinctiveness of the learned feature spaces. In conclusion, our results show the feasibility and uniqueness of using hierarchical filtering of perceived images combined with EEG-ViT-based features to improve brain decoding from EEG data.

Submission Number: 43

Loading