Model manifold analysis suggests the human visual brain is less like an optimal classifier and more like a feature bank

Colin Conwell; Michael Bonner

Model manifold analysis suggests the human visual brain is less like an optimal classifier and more like a feature bank

Colin Conwell, Michael Bonner

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: brain-predictivity, neural manifold geometry

TL;DR: Using metrics of manifold geometry, we attempt to better explain differences between deep neural network models of the ventral visual stream

Abstract: What do deep neural network (DNN) models tell us about the computational principles of visual information-processing in the biological brain? A now common finding in visual neuroscience is that many different kinds of DNN models -- each with different architectures, tasks, and training diets -- are all comparably performant predictors of image-evoked brain activity in the ventral visual cortex. This relative parity of highly diverse models may at first seem to undermine the common intuition that we can use these models to infer the key computational principles that govern the visual brain. In this work, we show to the contrary that comparable brain-predictivity does not preclude the differentiation of these same models in terms of the underlying manifold geometries that define them. To do this, we assess 12 manifold geometry metrics computed across a diverse set of 117 DNN models, curated to include multiple tasks, architectures, and input diets. We then use these metrics to predict how well each model aligns with occipitotemporal cortex (OTC) activity from the human fMRI Natural Scenes Dataset. We find that \textit{manifold signal-to-noise ratio} (a metric previously associated with few-shot learning) is a robust predictor of downstream brain-alignment and supersedes both other manifold geometry metrics (i.e. \textit{manifold capacity}) and downstream task-performance (e.g. top-k recognition accuracy) across multiple different image sets (e.g. ImageNet21K versus Places365) and model comparison probes (e.g. category-supervised versus self-supervised models). These results add to a growing body of evidence that the ventral visual stream serves as a basis set (or feature vocabulary) for object recognition rather than as the actual locus of recognition \textit{per se}.

Submission Number: 168

Loading