An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

Tyler R. Scott; Ting Liu; Michael Curtis Mozer; Andrew Gallagher

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

Tyler R. Scott, Ting Liu, Michael Curtis Mozer, Andrew Gallagher

Published: 06 Dec 2022, Last Modified: 05 May 2023ICBINB posterReaders: Everyone

Keywords: clustering, deep embeddings, retrieval, deep clustering, shallow clustering, computer vision

TL;DR: We study a suite of methods for clustering pretrained embeddings on datasets outside of face verification and find that the state-of-the-art, deep methods are surprisingly fragile, where they underperform shallow methods such as k-means.

Abstract: Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods---including $k$-means and hierarchical agglomerative clustering---underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often near ceiling), and the experimental methodology seemingly favors the deep methods. We conduct an empirical study of 14 clustering methods on two popular non-face datasets---Cars196 and Stanford Online Products---and obtain robust, but contentious findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they underperform the shallow, heuristic-based methods. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved.

0 Replies

Loading