Abstract: Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters,~$k$, or by relying on thresholds that implicitly determine~$k$. We introduce \method, a novel clustering algorithm that eliminates the need to set $k$~or any other parameters. Instead, it formulates the clustering problem as minimising a three-part encoding of the data. It uses this formulation to determine the optimal number of clusters, $k^*$, by splitting and merging clusters while also optimising the standard $k$-means objective. We prove that \method is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where~$k$ is unknown. We also show that it accurately estimates~$k$ and that, empirically, its runtime is competitive with existing methods and scales well with dataset size.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Hongfu_Liu2
Submission Number: 7279
Loading