Simple Temporal Attention Beats Complex Decoders for Neural-to-Visual Mapping from Primate Spiking Data
Keywords: Brain decoding, Intracortical spiking activity, Image reconstruction, Computational neuroscience, Cognitive science
TL;DR: We investigate visual information decoding from primate intracortical recordings, and propose models that emphasize temporal dynamics over architectural complexity for accurate and interpretable reconstructions.
Abstract: Understanding how neural activity gives rise to perception remains a fundamental challenge in neuroscience. Here, we address the problem of visual decoding from high-density intracortical recordings in primates using the THINGS Ventral Stream Spiking Dataset. We systematically evaluate the effects of model architecture, loss function, and temporal aggregation, showing that decoding accuracy is primarily driven by temporal dynamics rather than architectural complexity. A lightweight model combining temporal attention with a shallow MLP achieves up to 70% top-1 image retrieval accuracy, outperforming linear and recurrent baselines. Building on this, we introduce a modular generative pipeline that combines low-resolution latent reconstruction with semantically guided diffusion. By generating and ranking multiple candidate images via rejection sampling, our approach enables photorealistic reconstructions from 200 ms of brain activity. These results provide actionable insights for neural decoding and establish a flexible framework for future brain–computer interfaces and semantic reconstruction from brain signals.
Submission Number: 13
Loading