Topographic Vision Transformers

Yash Shah, Daniel LK Yamins

Published: 12 Jun 2025, Last Modified: 03 Sept 2025The 8th Annual Conference on Cognitive Computational NeuroscienceEveryoneCC BY 4.0

Abstract: Functional organization in the form of topographic maps is a hallmark of many cortical systems and is believed to arise from biophysical efficiency, such as the minimization of neuronal wiring length. Recently, Margalit et al. (2024) developed the TDANN as a topographic convolutional neural network (CNN) that recapitulated gross ventral stream topography while minimizing feedforward wiring length. However, standard CNNs lack mechanisms for within-layer long-range interactions that are well identified in the primate visual cortex. Here we leverage a vision transformer (ViT), which learns to behave locally like CNNs through training and possesses long-range interactions via self-attention, to learn topographic properties. We find that a topographic ViT reproduces key topographic motifs, maintains high object categorization performance, and shows reduced inter- and intra-layer wiring length. We thus introduce a new class of topographic models that can express hypotheses about the roles of local vs. long-range cortical interactions in the brain.