Efficient Affinity Propagation Clustering Based on Szemerédi's Regularity Lemma

Jian Hou, Juntao Ge, Huaqiang Yuan

Published: 2024, Last Modified: 16 Jan 2025KSEM (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the pairwise similarity matrix as the input, the affinity propagation clustering algorithm is able to identify clusters automatically without knowing the number of clusters. However, this algorithm suffers from a large computation load due to the large size of the similarity matrix. In order to improve the efficiency of this algorithm and apply it to large-scale datasets, we use Szemerédi’s regularity lemma to improve this algorithm. Based on this lemma, we partition a graph represented by the similarity matrix and derive a reduced graph, which can be regarded as a compressed version of the original graph. With the similarity matrix of the reduced graph as the input, we do affinity propagation clustering and then map the data labels back to the original graph, thereby obtaining the final clustering result. We investigate how the involved parameters influence the clustering results in experiments extensively and discuss the reasons behind their behaviors. In experiments on a series of real datasets the enhanced algorithm outperforms the original one in both computation efficiency and clustering quality. It also compares favorably to some recent algorithms.