Bagel: A Benchmark for Assessing Graph Neural Network Explanations

Bagel: A Benchmark for Assessing Graph Neural Network Explanations

TMLR Paper595 Authors

14 Nov 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Evaluating interpretability approaches for graph neural networks (GNN) specifically is known to be challenging due to the lack of a commonly accepted benchmark. Given a GNN model, several interpretability approaches exist to explain GNN models with diverse (sometimes conflicting) evaluation methodologies. In this paper, we propose a benchmark for evaluating the explainability approaches for GNNs called Bagel. In Bagel, we firstly propose four diverse GNN explanation evaluation regimes -- 1) faithfulness, 2) sparsity, 3) correctness. and 4) plausibility. We reconcile multiple evaluation metrics in the existing literature and cover diverse notions for a holistic evaluation. Our graph datasets range from citation networks, document graphs, to graphs from molecules and proteins. We conduct an extensive empirical study on four GNN models and nine post-hoc explanation approaches for node and graph classification tasks. We open both the benchmarks and reference implementations and make them available at https://anonymous.4open.science/r/Bagel-benchmark-F451/.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We address the reviewer's comments. We add results for additional experiments. All text changes in the revised version are in blue. For new tables and images, we highlight the caption in blue.

Assigned Action Editor: ~Shinichi_Nakajima2

Submission Number: 595

Loading