An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Foundation Models

Scott C. Lowe; Joakim Bruslund Haurum; Sageev Oore; Thomas B. Moeslund; Graham W. Taylor

An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Foundation Models

Scott C. Lowe, Joakim Bruslund Haurum, Sageev Oore, Thomas B. Moeslund, Graham W. Taylor

Published: 03 Jul 2024, Last Modified: 11 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: foundation models, self-supervised learning, SSL, empirical, benchmark, domain shift, clustering, images, computer vision

TL;DR: We benchmark pretrained image models for clustering unseen, real data at distances progressively further from the training distribution

Abstract: Can foundation models generalize to new datasets outside their training domain, without any retraining? Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, clustering image datasets that were not seen during training with conventional clustering algorithms. This evaluation allows us to investigate the impact of the pretraining protocol on a model's ability to generalize outside its training domain, and explore what is natively prioritized by the model in its embeddings in a real-world scenario where novel data lacks labels. We find supervised encoders typically offer more utility than SSL encoders within the training domain, and vice-versa far outside of it, however, fine-tuned SSL encoders demonstrate the opposite trend.

Submission Number: 100

Loading