Rethinking Language-Alignment in Human Visual Cortex with Syntax Manipulation and Word Models

ICLR 2025 Conference Submission13339 Authors

28 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multimodality, language models, vision models, visuosemantics, visual neuroscience
TL;DR: By systematically perturbing their inputs , we show that the ability of language models to predict activity in high-level visual cortex may largely reduce to co-occurence statistics between simple nouns in no syntactic order.
Abstract: Recent success predicting human ventral visual system responses to images from large language model (LLM) representations of image captions has sparked renewed interest in the possibility that high-level visual representations are aligned to language. Here, we further explore this possibility using image-caption pairs from the Natural Scenes fMRI Dataset, examining how well language-only representations of image captions predict image-evoked human visual cortical responses, compared to predictions based on vision model responses to the images themselves. As in recent work, we find that unimodal language models predict brain responses in human visual cortex as well as unimodal vision models. However, we find that the predictive power of large language models rests almost entirely on their ability to capture information about the nouns present in image descriptions, with little to no role for syntactic structure or semantic compositionality in predicting neural responses to static natural scenes. We propose that the convergence between language-model and vision-model representations and those of high-level visual cortex arises not from direct interaction between vision and language, but instead from common reference to real-world entities, and the prediction of brain data whose principal variance is defined by common objects in common, non-compositional contexts.
Primary Area: applications to neuroscience & cognitive science
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13339
Loading