K^*-means: a parameter-free clustering algorithm

ICLR 2026 Conference Submission16911 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: clustering, minimum description length
TL;DR: An new algorithm that functions like -means except doesn't require knowing , nor setting any threshold or other parameters
Abstract: Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters, k, or by relying on thresholds that implicitly determine k. We introduce k*-means, a novel clustering algorithm that eliminates the need to set k or any other parameters. Instead, it uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters at the same time as optimizing the standard k-means objective. We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown. We also show that it is accurate in estimating k, and that empirically its runtime is competitive with existing methods, and scales well with dataset size.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 16911
Loading