Keywords: processing streams, vision, neuroscience, fMRI, topography, convolutional neural networks, self-supervised learning
TL;DR: A model trained with a spatial constraint, on a single self-supervised task, can recapitulate the functional organization of visual cortex into multiple processing streams.
Abstract: A key organizing principle of visual cortex is functional specialization, whether locally in the context of category-selective patches, or on a broader scale in the case of visual processing streams. Primate visual cortex has traditionally been divided into two such processing streams, though recent research suggests that there may be at least three functionally and anatomically distinct streams, extending along the ventral, lateral, and parietal surfaces of the brain. While processing streams are typically thought of within the framework of what downstream behaviors/tasks they support, we ask instead whether anatomical constraints may be sufficient to produce this differentiation, even within the context of just one task objective. Comparing directly to human fMRI responses, we show that a model trained on a single task, and with novel anatomical constraints (Topographic DCNN), can capture not only the functional responses but also the segregation of visual cortex into distinct processing streams. The match to human data is strongest for a self-supervised vs. supervised objective and when the anatomical constraint, which encourages local response correlations as proxy for minimizing wiring length, is appropriately weighted. These results suggest that the broad-scale functional organization of visual cortex into parallel processing streams may be explained by the pressure to minimize biophysical costs such as wiring length, and that local spatial constraints can surprisingly percolate to create broad-scale structure.