Graph Minimum Factor Distance and Its Application to Large-Scale Graph Data Clustering

Jicong Fan

Graph Minimum Factor Distance and Its Application to Large-Scale Graph Data Clustering

Jicong Fan

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: The paper proposed new methods of comparing and clustering graphs.

Abstract: Measuring the distance or similarity between graphs is the foundation of many graph analysis tasks, such as graph classification and clustering, but remains a challenge on large datasets. In this work, we treat the adjacency matrices of two graphs as two kernel matrices given by some unknown indefinite kernel function performed on two discrete distributions and define the distance between the two distributions as a measure, called MMFD, of the dissimilarity between two graphs. We show that MMFD is a pseudo-metric. Although the initial definition of MMFD seems complex, we show that it has a closed-form solution with extremely simple computation. To further improve the efficiency of large-scale clustering, we propose an MMFD-KM with linear space and time complexity with respect to the number of graphs. We also provide a generalization of MMFD, called MFD, which is more effective in exploiting the information of factors of adjacency matrices. The experiments on simulated graphs intuitively show that our methods are effective in comparing graphs. The experiments on real-world datasets demonstrate that, compared to the competitors, our methods have much better clustering performance in terms of three evaluation metrics and time cost.

Lay Summary: Calculating the distance or similarity between graphs is key to many graph analysis tasks, like graph classification and clustering. However, it's still tough when dealing with large datasets. In our work, we represent two graphs as two discrete distributions and calculate the distance between the two distributions, which leads to a distance measure between graphs, called MMFD. MMFD is a pseudo-metric and has a very simple closed-form solution. To make large-scale clustering more efficient, we introduce MMFD-KM, which has linear space and time complexity relative to the number of graphs. We also expand on MMFD with MFD, which better uses the information in the adjacency matrices. Experiments on simulated graphs show our methods work well for comparing graphs. Real-world dataset tests prove our methods outperform competitors in clustering, based on three metrics and time cost.

Link To Code: https://github.com/jicongfan/Graph-Minimum-Factor-Distance

Primary Area: General Machine Learning->Clustering

Keywords: clustering; graphs; metric learning

Submission Number: 2523

Loading