Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

TMLR Paper621 Authors

21 Nov 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of \emph{expressiveness} and \emph{learnability}. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured as the learning speed of a KNN classifier trained to predict labels obtained by clustering the representations with $K$-means. We thus combine CL and ID into a single predictor -- CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several visual classification tasks, yielding improvements with respect to the competing baselines.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: * Title, abstract and introduction amended to tone down claims and generality of CLID as requested by reviewers HDno and UReY * Removed W-CLID from Table 1 and 2 as suggested by reviewer HDno * Fix typos and improved reading flow as suggested by all reviewers We also made the following addition: * we've include comparisons of CLID with one-shot predictions, as suggested by reviewer UReY. Reviewer UReY also suggested to include confidence intervals upon varying seeds and order of training data. Following this suggestion we ran some experiments; and as we mention in our comments below, such variations result in the exact same ranking of the models and kendall coefficient.

Assigned Action Editor: ~Tongliang_Liu1

Submission Number: 621

Loading