Abstract: Neuroscience and artificial intelligence (AI) both grapple with the challenge of interpreting high-dimensional neural data. Comparative analysis of such data is essential to uncover shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the ever-growing landscape of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? Prior work often evaluates metrics by their ability to differentiate models with varying origins (e.g., different architectures), but an alternative—and arguably more informative—approach is to assess how well these metrics distinguish models with distinct behaviors. This is crucial as representational comparisons are frequently interpreted as indicators of functional similarity in NeuroAI. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes in a suite of different downstream data distributions and tasks. We compared eight commonly used metrics in the visual domain, including alignment-based, CCA-based, inner product kernel-based, and nearest-neighbor-based methods, using group statistics and a comprehensive set of behavioral metrics. We found that metrics like the Procrustes distance and linear Centered Kernel Alignment (CKA), which emphasize alignment in the overall shape or geometry of representations, excelled in differentiating trained from untrained models and aligning with behavioral measures, whereas metrics such as linear predictivity, commonly used in neuroscience, demonstrated only moderate alignment with behavior. These findings highlight that some widely used representational similarity metrics may not directly map onto functional behaviors or computational goals, underscoring the importance of selecting metrics that emphasize behaviorally meaningful comparisons in NeuroAI research.
Submission Number: 89
Loading