One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment

Christina Sartzetaki; Gemma Roig; Cees G. M. Snoek; Iris Groen

One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment

Christina Sartzetaki, Gemma Roig, Cees G. M. Snoek, Iris Groen

Published: 22 Jan 2025, Last Modified: 10 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: representational alignment, Representational Similarity Analysis, RSA, benchmarking, neuro-AI, video AI, neuroscience, fMRI, cognitive AI

TL;DR: We benchmark 99 image and video models on brain representational alignment to fMRI data of humans watching video.

Abstract: What can we learn from comparing video models to human brains, arguably the most efficient and effective video processing systems in existence? Our work takes a step towards answering this question by performing the first large-scale benchmarking of deep video models on representational alignment to the human brain, using publicly available models and a recently released video brain imaging (fMRI) dataset. We disentangle four factors of variation in the models (temporal modeling, classification task, architecture, and training dataset) that affect alignment to the brain, which we measure by conducting Representational Similarity Analysis across multiple brain regions and model layers. We show that temporal modeling is key for alignment to brain regions involved in early visual processing, while a relevant classification task is key for alignment to higher-level regions. Moreover, we identify clear differences between the brain scoring patterns across layers of CNNs and Transformers, and reveal how training dataset biases transfer to alignment with functionally selective brain areas. Additionally, we uncover a negative correlation of computational complexity to brain alignment. Measuring a total of 99 neural networks and 10 human brains watching videos, we aim to forge a path that widens our understanding of temporal and semantic video representations in brains and machines, ideally leading towards more efficient video models and more mechanistic explanations of processing in the human brain.

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11218

Loading