Transformer brain encoders explain human high-level visual responses

Hossein Adeli; Minni Sun; Nikolaus Kriegeskorte

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Minni Sun, Nikolaus Kriegeskorte

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Brain encoding, visual processing, transformers

TL;DR: We present a transformer brain encoder that achieves state of the art performance, by leveraging brain-region to image-feature cross-attention mechanism, efficiently mapping high-dimensional retinotopic features to brain areas.

Abstract: A major goal of neuroscience is to understand brain computations during visual processing in naturalistic settings. A dominant approach is to use image-computable deep neural networks trained with different task objectives as a basis for linear encoding models. However, in addition to requiring estimation of a large number of linear encoding parameters, this approach ignores the structure of the feature maps both in the brain and the models. Recently proposed alternatives factor the linear mapping into separate sets of spatial and feature weights, thus finding static receptive fields for units, which is appropriate only for early visual areas. In this work, we employ the attention mechanism used in the transformer architecture to study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing. We show that this computational motif is significantly more powerful than alternative methods in predicting brain activity during natural scene viewing, across different feature basis models and modalities. We also show that this approach is inherently more interpretable as the attention-routing signals for different high-level categorical areas can be easily visualized for any input image. Given its high performance at predicting brain responses to novel images, the model deserves consideration as a candidate mechanistic model of how visual information from retinotopic maps is routed in the human brain based on the relevance of the input content to different category-selective regions. Our code is available at \href{https://github.com/Hosseinadeli/transformer_brain_encoder/}{https://github.com/Hosseinadeli/transformer\_brain\_encoder/}.

Supplementary Material: zip

Primary Area: Neuroscience and cognitive science (e.g., neural coding, brain-computer interfaces)

Submission Number: 18823

Loading