Accelerating K-Means by Grouping Points Automatically

Qiao Yu, Bi-Ru Dai

Published: 01 Jan 2017, Last Modified: 26 May 2024DaWaK 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.