Comparing Representations in Static and Dynamic Vision Models to the Human Brain

Hamed Karimi; Stefano Anzellotti

Comparing Representations in Static and Dynamic Vision Models to the Human Brain

Hamed Karimi, Stefano Anzellotti

Published: 10 Oct 2024, Last Modified: 06 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0

Track: Proceedings Track

Keywords: Action recognition, Masked Autoencoders, Convolutional Neural Networks, Human Visual system

TL;DR: Neural responses to videos align better with models trained on dynamic information, with optic flow models capturing unique brain activity that Masked Autoencoders miss, particularly in early visual processing stages.

Abstract: We compared neural responses to naturalistic videos and representations in deep network models trained with static and dynamic information. Models trained with dynamic information showed greater correspondence with neural representations in all brain regions, including those previously associated with the processing of static information. Among the models trained with dynamic information, those based on optic flow accounted for unique variance in neural responses that were not captured by Masked Autoencoders. This effect was strongest in ventral and dorsal brain regions, indicating that despite the Masked Autoencoders' effectiveness at a variety of tasks, their representations diverge from representations in the human brain in the early stages of visual processing.

Submission Number: 66

Loading