Keywords: clustering, deep embeddings, retrieval, deep clustering, shallow clustering, computer vision
TL;DR: We study a suite of methods for clustering pretrained embeddings on datasets outside of face verification and find that the state-of-the-art, deep methods are surprisingly fragile, where they underperform shallow methods such as k-means.
Abstract: Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods---including $k$-means and hierarchical agglomerative clustering---underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often near ceiling), and the experimental methodology seemingly favors the deep methods. We conduct an empirical study of 14 clustering methods on two popular non-face datasets---Cars196 and Stanford Online Products---and obtain robust, but contentious findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they underperform the shallow, heuristic-based methods. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved.
0 Replies
Loading