Abstract: Graph-based clustering has been shown to be promising, partly due to the rich data relationship encoded in affinity graphs. However, the graph representation also means a large computation and storage load for large-scale datasets. Several previous works show that it is promising to improve graph-based clustering based on Szemerédi’s regularity lemma, which roughly states that each graph can be partitioned into a small number of random-like graphs. We find in experiments that the results of these methods are sensitive to the involved parameters, and therefore propose a thorough investigation of the influence of several parameters on clustering results and discuss the reason behind their behaviors. As a result, we find out some clues as the determination of these parameters in practical applications. In experiments on a number of real datasets, we find that with proper parameters, the regularity lemma is able to improve both the clustering quality and computation efficiency significantly. Furthermore, experiments show that two relatively old-fashioned algorithms are enhanced to outperform recent state-of-the-art ones. This work goes a step further in extending the application of the regularity lemma from pure theoretical to practical realms.
Loading