Disentangling the Factors of Convergence between Brains and DINOv3

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: NeuroAI; Brain–AI alignment; Representational alignment; Hierarchy alignment; Emergence; Vision transformers; Self-supervised learning; fMRI; MEG; Temporal dynamics; Spatial dynamics, Cortical hierarchy; Development
TL;DR: We disentangle how architecture, training and data shape brain-like representations and hierarchy in vision models (eg DINOv3). Strikingly, the development of these vision models over training mirrors several aspects of the human brain development.
Abstract: Many AI models trained on natural images develop representations that resemble those of the human brain. However, the factors driving this brain-model similarity remain poorly understood. To disentangle how the model, training and data independently lead a neural network to develop brain-like representations, we train a family of self-supervised vision transformers (DINOv3) that systematically vary these factors. We compare their representations of images to those of the human brain recorded through fMRI and MEG, providing high resolution in both spatial and temporal analyses. We assess the brain-model similarity with three complementary metrics focusing on representational similarity, topographical organization, and temporal dynamics. We show that all three factors - model size, training amount, and image type - independently and interactively impact each of these brain similarity metrics. In particular, the largest DINOv3 models trained with the most human-centric images reach the highest brain-similarity. These findings generalize across seven additional models. This emergence of brain-like representations in AI models follows a specific chronology during training: models first align with the early representations of the sensory cortices, and only align with the late and prefrontal representations of the brain with considerably more training. Finally, this developmental trajectory is indexed by structural and functional properties of the human cortex: representations acquired last by the models specifically align with cortical areas with the largest developmental expansion, thickness, least myelination and slowest timescales. Overall, these findings disentangle the interplay between architecture and experience in shaping how artificial neural networks come to see the world as humans do, thus offering a promising framework to understand how the human brain comes to represent its visual world.
Supplementary Material: pdf
Primary Area: applications to neuroscience & cognitive science
Submission Number: 12743
Loading