Visual Representations in Humans and Machines: A Comparative Analysis of Artificial and Biological Neural Responses to Naturalistic Dynamic Visual Stimuli
Keywords: self-supervised learning, visual representation, occipitotemporal cortex, human vision, masked autoencoders
TL;DR: Masked Autoencoders yield visual representations that diverge from human neural responses, with video MAEs containing temporal information showing closer alignment than image MAEs, but optic flow-based convolutional networks outperform both.
Abstract: Visual representations in the human brain are shaped by the pressure to support planning and interactions with the environment. Do visual representations in deep network models converge with visual representations in humans? Here, we investigate this question for a new class of effective self-supervised models: Masked Autoencoders (MAEs). We compare image MAEs and video MAEs to neural responses in humans as well as convolutional neural networks. The results reveal that representations learned by MAEs diverge from neural representations in humans and convolutional neural networks. Fine-tuning MAEs with a supervised task improves their correspondence with neural responses but is not sufficient to bridge the gap that separates them from supervised convolutional networks. Finally, video MAEs show closer correspondence to neural representations than image MAEs, revealing an important role of temporal information. However, convolutional networks based on optic flow show a closer correspondence to neural responses in humans than even video MAEs, indicating that while masked autoencoding yields visual representations that are effective at multiple downstream tasks, it is not sufficient to learn representations that converge with human vision.
Primary Area: applications to neuroscience & cognitive science
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12459
Loading