Structured Multivariate Time-Series Modeling for Diffusion-Based EEG-to-Image Reconstruction

Published: 01 Jun 2026, Last Modified: 01 Jun 2026CVPR 2026 Workshop WiCV Proceedings Track PosterEveryoneRevisionsCC BY 4.0
Keywords: EEG, Time-series, Image reconstruction, Diffusion Model, CLIP
TL;DR: We explicitly formulate EEG as a structured multichannel time-series process and introduce a transformer-based encoding framework tailored for hierarchical temporal modeling.
Abstract: Reconstructing visual stimuli from electroencephalography (EEG) signals is a challenging problem at the intersection of computer vision and neuroscience. Unlike fMRI, EEG offers high temporal resolution and portability but suffers from low spatial specificity and high noise. We propose a time-series–driven EEG-to-image reconstruction framework that models EEG explicitly as multivariate temporal data using a Channel-Aligned Robust Blend Transformer (CARD). The extracted spatiotemporal representations are aligned to the CLIP semantic space and used to condition a Stable Diffusion model for image synthesis. Experiments on a public EEG-visual dataset demonstrate superior perceptual fidelity (FID 3.57) and structural similarity (SSIM 0.504) compared to VAE, GAN, and prior diffusion-based baselines. Extensive ablation studies validate the importance of temporal modeling and semantic alignment.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 22
Loading