Abstract: Clustering is the process of partitioning data into different clusters with the goal of minimizing the difference of objects within each cluster, where the commonly used evaluation function is defined as the sum of the squared distance from each point to the cluster center to which it belongs. Nevertheless, this general evaluation function is extremely vulnerable to outliers and noisy data, and it is sensitive to initial cluster centers. More seriously, this evaluation function cannot effectively represent the core of clustering results; even if the partition achieves the global optimum value according to the evaluation function, the clustering results may not be good. In this study, we propose a multi-start local search algorithm (MLS) with several techniques to tackle this problem. First, the center of each cluster is no longer its centroid, which reduces the dependence of the cluster algorithm on difference in size and shape of ideal clusters. Second, the number of adjacent points shared between clusters is defined as the new objective function. Third, two basic meta-operations, merge and split, are used to optimize the objective function and make the iterative process insensitive to the initial solution. The novelty of our approach is the selection criterion of the initial centers and the new objective function, which enables MLS to explore more promising search area. Experimental results demonstrate that MLS outperforms traditional centroid-based clustering algorithms in terms of both solution quality and computational efficiency, and it is quite competitive to other reference algorithms such as spectral, density, and geometric based clustering algorithms.
Loading