An Efficient Split-Merge Re-Start for the $K$K-Means Algorithm

Marco Capó; Aritz Pérez; José Antonio Lozano

An Efficient Split-Merge Re-Start for the $K$K-Means Algorithm

Marco Capó, Aritz Pérez, José Antonio Lozano

Published: 01 Jan 2022, Last Modified: 07 Oct 2024IEEE Trans. Knowl. Data Eng. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The $K$ -means algorithm is one of the most popular clustering methods. However, it is a well-known fact that its performance, in terms of quality of the obtained solution and computational load, highly depends upon its initialization phase. For this reason, different initialization techniques have been developed throughout the years to enable its fast convergence to competitive solutions. In this sense, it is common practice to re-start the $K$ -means algorithm several times via one of these techniques and keep the solution with the lowest error. Unfortunately, such a choice is still likely to be a poor approximation of the optimal set of centroids. In this article, we introduce a cheap Split-Merge step that can be used to re-start the $K$ -means algorithm after reaching a fixed point. Under some settings, one can show that this approach reduces the error of the given fixed point without requiring any further iteration of the $K$ -means algorithm. Moreover, experimental results show that this strategy is able to generate approximations with an associated error that is hard to reach for different multi-start methods, such as multi-start Forgy $K$ -means, $K$ -means++ and Hartigan $K$ -means, while also computing a lower amount of distances than the previous algorithms.

Loading