Graph Edit Distance Evaluation Datasets: Pitfalls and Mitigation

Published: 16 Nov 2024, Last Modified: 04 Dec 2024LoG 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: graph neural networks, graph edit distance, dataset quality
TL;DR: We present a new suite of datasets for Graph Edit Distance (GED) estimation that eliminates isomorphism bias and incorporates variable edit costs, enabling more accurate benchmarking of state-of-the-art methods.
Abstract: Graph Edit Distance (GED) is a powerful framework for modeling both symmetric and asymmetric relationships between graph pairs under various cost settings. Due to the combinatorial intractability of exact GED computation, recent advancements have focused on neural GED estimators that approximate GED by leveraging data distribution characteristics. These estimators map the structural information of graphs into an embedding space while preserving essential graph invariances and equivariances. However, the datasets commonly used to benchmark such neural models exhibit two critical flaws: (1) significant isomorphism bias leading to high likelihood of train-test leakage, with only a small fraction of graphs being structurally unique (8.9\% in Linux, 25.8\% in IMDB, and 41\% in AIDS data sets), and (2) reliance on uniform edit costs for GED ground truths. These limitations constrain the evaluation of learning and generalization capabilities of competing methods, casting doubt on the validity of existing results and suggesting potential biases in comparative studies. In this work, we introduce and release a comprehensive suite of datasets specifically designed to rectify these shortcomings. Our datasets eliminate isomorphism leakage and incorporate a range of edit costs, facilitating more accurate assessment of GED methods. We conduct benchmarking evaluations of state-of-the-art methods using these datasets, providing insights into their true generalization capabilities. By making these datasets available as open-source resources, we offer a robust foundation for advancing research in GED estimation.
Submission Type: Extended abstract (max 4 main pages).
Poster: jpg
Poster Preview: jpg
Submission Number: 184
Loading