Scaling Up Mass-Based ClusteringOpen Website

Published: 01 Jan 2022, Last Modified: 10 Apr 2024CIKM 2022Readers: Everyone
Abstract: This paper addresses the problem of scaling up the mass-based clustering paradigm to handle large datasets. The existing algorithm MBScan computes and stores all pairwise distances, resulting in quadratic time and space complexity. However, we observe that mass-based clustering requires information about only a tiny fraction of all possible data point pairs. We propose three optimizations to MBScan for quickly finding such pairs and computing their distances. We empirically evaluate our work on ten real-world and synthetic datasets. Our experiments show that our approach results in fast and memory-efficient clustering with no loss in the quality of clusters.
0 Replies

Loading