Toward Autonomous Distributed Clustering
Abstract: Compared to traditional centralized clustering, distributed
clustering offers the advantage of parallel processing of data from different
sites, enhancing the efficiency of clustering while preserving the privacy
of the data at each site. However, most existing distributed clustering
techniques require manual tuning of several parameters or hyperparameters,
which can pose challenges for practical applications. This paper
introduces a novel parameter-free distributed clustering framework known
as distributed torque clustering (DTC).When dealing with data or subdata
distributed across various sites, DTC predominantly executes two steps.
The first step is a data reduction at each site using torque clustering,
and the second step involves performing global clustering with weighted
torque clustering.We compare DTC against six state-of-the-art distributed
clustering algorithms and automatic centralized clustering techniques on
ten large-scale or medium-scale datasets. The results show that the average
rank of DTC is at least three times better than those of the other algorithms
across all the datasets. Additionally, DTC can accurately predict
the ground-truth number of clusters in nine out of ten datasets, further
demonstrating its competitive performance and practical potential.
Loading