Abstract: Hierarchical clustering has been extensively applied for data analysis and knowledge discovery. However, the scalability of hierarchical clustering methods is generally limited due to their time complexity of O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), where n is the size of the input data. To address this issue, we present a fast and accurate hierarchical clustering algorithm based on topology training. Specifically, a trained multilayer topological structure that fits the spatial distribution of the data is utilized to accelerate the similarity measurement, which dominates the computational cost in hierarchical clustering. Moreover, the topological structure also guides the merging steps in hierarchical clustering to form a meaningful and accurate clustering result. In addition, an incremental version of the proposed algorithm is further designed so that the proposed approach is applicable to the streaming data as well. Promising experimental results on various data sets demonstrate the efficiency and effectiveness of the proposed algorithms.
0 Replies
Loading