Abstract: Clustering is a typical unsupervised data analysis method, which divides a given data set without label information into multiple clusters. The data on each cluster has a great deal of association, which can be used as the preprocessing stage of other algorithms or for further association analysis. Therefore, clustering plays an important role in a wide range of fields. Chameleon is a clustering algorithm that combines the relative interconnectivity and relative closeness to find clusters of arbitrary shape with high quality. However, the graph-partitioning technology hMETIS algorithm used in the algorithm is difficult to operate and easy to cause uncertainty of results. In addition, the final number of clusters need to be specified by user as a parameter to stop merging, which is difficult to determine without prior information. Aiming at these shortcomings, Chameleon algorithm based on mutual k-nearest neighbors (MChameleon) is proposed. Firstly, the idea of mutual k-nearest neighbors is introduced to directly generate sub-clusters, which omits the process of partitioning graph. Then, the concept of MC modularity is introduced, which is used to objectively identify the final clustering results. By experiments on artificial data sets and UCI data sets, we compared MChameleon with the original Chameleon algorithm, the improved AChameleon algorithm and the classic K-Means, DBSCAN, BIRCH algorithm in accuracy. Experimental results on data sets show that Chameleon algorithm based on mutual k-nearest neighbors has great advantages and is feasible.
Loading