Keywords: brain alignment, benchmarking, representational similarity analysis, video models
TL;DR: We expose the limits of brain alignment of SOTA video models, and propose a framework based on cross-region alignment patterns in the brain towards more robust and meaningful assessment of brain-model alignment.
Abstract: Neuroscientists and computer vision scientists alike have relied on model-brain alignment benchmarks to find parallels between artificial and biological vision systems. These benchmarks rank models according to alignment measures (AM) such as representational similarity analysis (RSA) and linear predictivity (LP). However, recent works have revealed a number of problems with these rankings, such as their sensitivity towards the choice of AM, raising the deeper conceptual question of what it means for a model to be “brain-aligned.”
Here, we introduce the notion of *alignment patterns* - characteristic patterns of alignment between brain regions-and posit that models should reproduce these patterns in order to be considered brain-aligned.
First, we apply a standard benchmarking pipeline to a broad spectrum of vision models on the BOLD-Moments video fMRI dataset across visual regions of interest (ROIs).
We find that, while this pipeline can identify nominally best predictive models, many other models fall within subject-level variability and are therefore practically equivalent in terms of brain alignment.
We then apply our complementary relational criterion: a ROI-aligned model should reproduce that ROIs cross-region alignment pattern. We find that, while these patterns are highly stable across brains of different subjects, even top-ranked models fail to capture them. Notably, models that appear practically equivalent in predictive accuracy diverge sharply under the relational criterion, revealing both the limitations with respect to discriminative power of existing evaluation pipelines, as well as alignment pattern analysis as a way of increasing this discriminative power.
Finally, we argue for a principled distinction between brain-predictivity and brain-alignment. For applications such as digital twins, prediction performance may suffice; but for understanding the inductive biases of the visual system, models should satisfy stricter distributional and relational criteria.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 13969
Loading