Efficient Anytime Density-based Clustering

Christian Böhm, Jing Feng, Xiao He, Son T. Mai

2013 (modified: 24 Apr 2023)SDM 2013Readers: Everyone

Abstract: Many clustering algorithms suffer from scalability problems on massive datasets and do not support any user interaction during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further run. Also, they can be stopped or suspended anytime and provide an answer. In this paper, we propose a novel anytime clustering algorithm based on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to very high dimensional databases such as time series, trajectory, medical data, etc. The general idea of our algorithm is to use a sequence of lower-bounding functions (LBs) of the true similarity measure to produce multiple approximate results of the true density-based clusters. A-DBSCAN operates in multiple levels w.r.t. the LBs and is mainly based on two algorithmic schemes: (1) an efficient distance upgrade scheme which restricts distance calculations to core-objects at each level of the LBs; (2) a local re-clustering scheme which restricts update operations to the relevant objects only. Extensive experiments demonstrate that A-DBSCAN acquires very good clustering results at very early stages of execution thus saves a large amount of computational time. Even if it runs to the end, A-DBSCAN is still orders of magnitude faster than DBSCAN.

0 Replies