Balanced k-Means Revisited

09 Jan 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The k-means algorithm aims at minimizing the variance within clusters without considering the balance of cluster sizes. Balanced k-means defines the partition as a pairing problem which enforces the cluster sizes to be strictly balanced, but the resulting algorithm is impractically slow O(n3). Regularized k-means addresses the problem by using a regularization term including a balance parameter. It works reasonably well when the balance of the cluster sizes is a mandatory requirement but does not generalize well for soft balance requirements. In this paper, we revisit the k-means algorithm as a two-objective optimization problem with two goals contradicting each other: to minimize the variance within clusters and to minimize the difference in cluster sizes. The proposed algorithm implements a balance-driven variant of k-means which initially only focuses on minimizing the variance but adds more weight to the balance constraint in each iteration. The resulting balance degree is not determined by a control parameter that has to be tuned, but by the point of termination which can be precisely specified by a balance criterion.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Brian_Kulis1
Submission Number: 759
Loading