GREED: A Neural Framework for Learning Graph Distance Functions

Rishabh Ranjan; Siddharth Grover; Sourav Medya; Venkatesan Chakaravarthy; Yogish Sabharwal; Sayan Ranu

GREED: A Neural Framework for Learning Graph Distance Functions

Rishabh Ranjan, Siddharth Grover, Sourav Medya, Venkatesan Chakaravarthy, Yogish Sabharwal, Sayan Ranu

Published: 31 Oct 2022, Last Modified: 05 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: edit distance, subgraph edit distance, learning graph distance, graph neural networks

TL;DR: Learning graph and subgraph edit distance using graph neural networks

Abstract: Similarity search in graph databases is one of the most fundamental operations in graph analytics. Among various distance functions, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made, there exist limitations that need to be addressed. First, the efficacy of an approximate distance function lies not only in its approximation accuracy, but also in the preservation of its properties. To elaborate, although GED is a metric, its neural approximations do not provide such a guarantee. This prohibits their usage in higher order tasks that rely on metric distance functions, such as clustering or indexing. Second, several existing frameworks for GED do not extend to SED due to SED being asymmetric. In this work, we design a novel siamese graph neural network called Greed, which through a carefully crafted inductive bias, learns GED and SED in a property-preserving manner. Through extensive experiments across $10$ real graph datasets containing up to $7$ million edges, we establish that Greed is not only more accurate than the state of the art, but also up to $3$ orders of magnitude faster. Even more significantly, due to preserving the triangle inequality, the generated embeddings are indexable and consequently, even in a CPU-only environment, Greed is up to $50$ times faster than GPU-powered computations of the closest baseline.

Supplementary Material: pdf

20 Replies

Loading