Grounding Representation Similarity Through Statistical TestingDownload PDF

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: representation similarity, dissimilarity, metric, CKA, CCA, Orthogonal Procrustes, benchmark, representation learning, probing, deep networks
Abstract: To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures. Unfortunately, these widely used measures often disagree on fundamental observations, such as whether deep networks differing only in random initialization learn similar representations. These disagreements raise the question: which, if any, of these dissimilarity measures should we believe? We provide a framework to ground this question through a concrete test: measures should have \emph{sensitivity} to changes that affect functional behavior, and \emph{specificity} against changes that do not. We quantify this through a variety of functional behaviors including probing accuracy and robustness to distribution shift, and examine changes such as varying random initialization and deleting principal components. We find that current metrics exhibit different weaknesses, note that a classical baseline performs surprisingly well, and highlight settings where all metrics appear to fail, thus providing a challenge set for further improvement.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: We quantitatively benchmark dissimilarity measures for representations and find that Orthogonal Procrustes performs best.
Supplementary Material: pdf
Code: https://github.com/js-d/sim_metric
17 Replies

Loading