TL;DR: Our method, CLUDI, leverages diffusion models to learn continuous latent embeddings for clustering, achieving SOTA unsupervised classification on diverse datasets.
Abstract: Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher–student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions.
Lay Summary: Deep-learning methods for grouping unlabeled images often fall into two pitfalls: they either collapse to a single, undifferentiated group or rely on heavy data-augmentation tricks. Our approach, Clustering via Diffusion (CLUDI), breaks that pattern by exploiting the incremental denoising approach of modern diffusion generators. Guided by Vision Transformer features, the diffusion process begins with pure noise and gradually sculpts a compact embedding whose coordinates act like votes for each class. Re-running the process with different noise seeds produces diverse candidate class assignments. The model is trained using a student-teacher siamese architecture, where a self-supervised student network distills noisy versions of the teacher data into one stable assignment. Evaluated across multiple datasets, including challenging ImageNet subsets and several other vision benchmarks, CLUDI consistently outperforms strong Vision Transformer baselines in clustering accuracy, normalized mutual information, and adjusted Rand index. By turning a generative diffusion model into a dependable classifier, CLUDI offers researchers and engineers a novel, annotation-free way to uncover structure in massive image collections.
Primary Area: General Machine Learning->Clustering
Keywords: Self-supervised, Clustering, Diffusion
Submission Number: 10498
Loading