Monkey See, Model Knew: Large Language Models accurately Predict Human AND Macaque Visual Brain Activity

Published: 10 Oct 2024, Last Modified: 07 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0
Track: Extended Abstract Track
Keywords: vision-language; large language models; human visual cortex; macaque visual cortex; fMRI; comparative neuroscience
TL;DR: In this work, we show that large language models can accurately predict image-evoked visual activity in human AND macaque visual cortex. This suggests that the predictions of language models aren't necessarily about language per se....
Abstract: Recent progress in multimodal AI and “language-aligned” visual representation learning has rekindled debates about the role of language in shaping the human visual system. In particular, the emergent ability of “language-aligned” vision models (e.g. CLIP) -- and even pure language models (e.g. BERT) -- to predict image-evoked brain activity has led some to suggest that human visual cortex itself may be “language-aligned” in comparable ways. But what would we make of this claim if the same procedures worked in the modeling of visual activity in a species that does not have language? Here, we deploy controlled comparisons of pure-vision, pure-language, and multimodal vision-language models in prediction of human (N=4) and rhesus macaque (N=6, 5:IT, 1:V1) ventral visual activity evoked in response to the same set of 1000 captioned natural images (the “NSD1000”). The results reveal markedly similar patterns in aggregate model predictivity of early and late ventral visual cortex across both species. This suggests that language model predictivity of the human visual system is not necessarily due to the evolution or learning of language per se, but rather to the statistical structure of the visual world that is reflected in the statistics of language as data.
Submission Number: 76
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview