Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics

Sourav Dutta, Pratik Nayek, Arnab Bhattacharya

2017 (modified: 12 Nov 2022)WWW 2017Readers: Everyone

Abstract: Labeled graphs provide a natural way of representing entities, relationships and structures within real datasets such as knowledge graphs and protein interactions. Applications such as question answering, semantic search, and motif discovery entail efficient approaches for subgraph matching involving both label and structural similarities. Given the NP-completeness of subgraph isomorphism and the presence of noise, approximate graph matching techniques are required to handle queries in a robust and real-time manner. This paper presents a novel technique to characterize the subgraph similarity based on statistical significance captured by chi-square statistic. The statistical significance model takes into account the background structure and label distribution in the neighborhood of vertices to obtain the best matching subgraph and, therefore, robustly handles partial label and structural mismatches. Based on the model, we propose two algorithms, VELSET and NAGA, that, given a query graph, return the top-k most similar subgraphs from a (large) database graph. While VELSET is more accurate and robust to noise, NAGA is faster and more applicable for scenarios with low label noise. Experiments on large real-life graph datasets depict significant improvements in terms of accuracy and running time in comparison to the state-of-the-art methods.

0 Replies