k∗means: A Parameter-free Clustering Algorithm

k∗means: A Parameter-free Clustering Algorithm

TMLR Paper7279 Authors

01 Feb 2026 (modified: 06 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters,~$k$, or by relying on thresholds that implicitly determine~$k$. We introduce \method, a novel clustering algorithm that eliminates the need to set $k$~or any other parameters. Instead, it formulates the clustering problem as minimising a three-part encoding of the data. It uses this formulation to determine the optimal number of clusters, $k^*$, by splitting and merging clusters while also optimising the standard $k$-means objective. We prove that \method is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where~$k$ is unknown. We also show that it accurately estimates~$k$ and that, empirically, its runtime is competitive with existing methods and scales well with dataset size.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Hongfu_Liu2

Submission Number: 7279

Loading