Abstract: The k-means algorithm aims at minimizing the variance within clusters without considering
the balance of cluster sizes. Balanced k-means defines the partition as a pairing problem which
enforces the cluster sizes to be strictly balanced, but the resulting algorithm is impractically
slow O(n3). Regularized k-means addresses the problem by using a regularization term
including a balance parameter. It works reasonably well when the balance of the cluster sizes
is a mandatory requirement but does not generalize well for soft balance requirements. In
this paper, we revisit the k-means algorithm as a two-objective optimization problem with
two goals contradicting each other: to minimize the variance within clusters and to minimize
the difference in cluster sizes. The proposed algorithm implements a balance-driven variant
of k-means which initially only focuses on minimizing the variance but adds more weight
to the balance constraint in each iteration. The resulting balance degree is not determined
by a control parameter that has to be tuned, but by the point of termination which can be
precisely specified by a balance criterion.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Brian_Kulis1
Submission Number: 759
Loading