Abstract: Density peaks clustering (DPC) is one of the density-based clustering algorithms and has been widely studied and applied in recent years because of its unique parameter, non-iteration and good robustness. However, it cannot effectively identify the cluster centers, and time and space complexities are too high. To this end, this paper proposes a fast density peaks clustering algorithm based on approximate k-nearest neighbors (FDPAN). Firstly, it uses Balanced K-means based Hierarchical K-means (BKHK) method to partition the data and quickly find the approximate k-nearest neighbors (AKNN), improving the algorithm’s efficiency on large-scale high-dimensional data. Meanwhile, three-way clustering is used to improve the neighbor search of the boundary points of the partition. Then, the local density and relative distance of DPC are recalculated by AKNN. Finally, according to the similar density chain, the connected high-density points are labeled while searching for the cluster center, and the remaining points are assigned to the clusters where their nearest higher-density points are located. Theoretical analysis and experiments on synthetic and real datasets show that FDPAN can obtain higher clustering results and shorten the operation time on large-scale high-dimensional data compared with DPC and its variants.
External IDs:dblp:journals/tkde/DingLXGDW25
Loading