Efficient Density-peaks Clustering Algorithms on Static and Dynamic Data in Euclidean Space

Published: 01 Jan 2024, Last Modified: 25 Jan 2025ACM Trans. Knowl. Discov. Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications because it can discover clusters of arbitrary shapes. This article addresses the problem of Density-Peaks Clustering (DPC) in Euclidean space. DPC already has many applications, but its straightforward implementation incurs O(n2) time, where n is the number of points, thereby does not scale to large datasets. To enable DPC on large datasets, we first propose empirically efficient exact DPC algorithm, Ex-DPC. Although this algorithm is much faster than the straightforward implementation, it still suffers from O(n2) time theoretically. We hence propose a new exact algorithm, Ex-DPC++, that runs in o(n2) time. We accelerate their efficiencies by leveraging multi-threading. Moreover, real-world datasets may have arbitrary updates (point insertions and deletions). It is hence important to support efficient cluster updates. To this end, we propose D-DPC for fully dynamic DPC. We conduct extensive experiments using real datasets, and our experimental results demonstrate that our algorithms are efficient and scalable.
Loading