Efficiency and Effectiveness Clustering with Adaptive Neighbors for Large-scale Data

Ping Hu, Haijun Yan, Chenggang Lu, Yesong Xu

Published: 01 Jan 2025, Last Modified: 31 Jul 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Clustering with Adaptive Neighbors (CAN), learning a similarity matrix by adaptive and optimal neighbors, has attracts increasingly more attention. However, existing CAN methods usually focus on improving clustering performance, while it is well known that high-quality similarity matrix is time-consuming. In the era of big data, CAN methods encounter significant challenges in processing large-scale datasets, which restricts their widespread application. To deal with this problem, this study proposes a Deep Encoding enhanced Adaptive Clustering (DEAC) for large-scale data, which combines deep encoding techniques with adaptive clustering to address the scalability issue in data analysis. By leveraging deep encoding, the proposed method learns effective encoding representation of the similarity matrix, thereby improving clustering efficiency of the CAN model. Specifically, the DEAC framework involves sampling large-scale data and using the sampled subset to train a deep encoder that captures the graph structure and nearest neighbors. The trained deep encoder is then applied to the original large-scale data to rapidly construct an effective Similarity Graph (SG) for subsequent clustering. Ultimately, comprehensive experiments on four realistic datasets exhibit the efficiency and effectiveness of DEAC compared with traditional clustering models.