Abstract: The lack of annotated data often limits the training of machine learning models. In addition, during the labelling process, some data points may remain unlabelled. While unsupervised methods such as clustering can reveal the underlying structure of the data, they are typically unsuitable to place new samples into existing clusters. Here, we propose Spectral Clustering for Unsupervised decision Tree (SCUT), a novel hierarchical clustering method based on algebraic connectivity that can position new data points appropriately within the clustering structure. By leveraging a feature-splitting approach, SCUT also enables straightforward extraction of {\em ante-hoc} explanations for its clustering decisions. Formally, SCUT works by recursively splitting the data through the solution of the Normalized Cut (NCUT) problem—a graph-partitioning formulation that seeks to split a graph into balanced subsets while minimizing the total connection strength between them—on a bipartite graph. We demonstrate, both visually and quantitatively, that SCUT captures the intrinsic structure of data more effectively than existing methods, while offering competitive performance compared to common hierarchical clustering algorithms.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Bernhard_C_Geiger1
Submission Number: 6704
Loading