Abstract: Graph isomorphism is a problem for which there is no known polynomial-time solution. The more general problem of computing graph similarity metrics, graph edit distance or maximum common subgraph, is NP-hard. Nevertheless, assessing (dis)similarity between two or more networks is a key task in many areas, such as image recognition, biology, chemistry, computer and social networks. In this article, we offer a statistical answer to the following questions: (a) “Are networks \(G_1\) and \(G_2\) similar?”, (b) “How different are the networks \(G_1\) and \(G_2\)?” and (c) “Is \(G_3\) more similar to \(G_1\) or \(G_2\)?”. Our comparisons begin with the transformation of each graph into an all-pairs distance matrix. Our node-node distance, Jaccard distance, has been shown to offer an accurate reflection of the graph’s connectivity structure. We then model these distances as probability distributions. Finally, we use well-established statistical tools to gauge the (dis)similarities in terms of probability distribution (dis)similarity. This comparison procedure aims to detect (dis)similarities in connectivity structure and community structure in particular, not in easily observable graph characteristics, such as degrees, edge counts or density. We validate our hypothesis that graphs can be meaningfully summarized and compared via their node-node distance distributions, using several synthetic and real-world graphs. Empirical results demonstrate its validity and the accuracy of our comparison technique.
Loading