[Re] GNNInterpreter: A probabilistic generative model-level explanation for Graph Neural Networks

Ana Vasilcoiu; T.H.F. Stessen; Thies Kersten; Batu Helvacioglu

[Re] GNNInterpreter: A probabilistic generative model-level explanation for Graph Neural Networks

Ana Vasilcoiu, T.H.F. Stessen, Thies Kersten, Batu Helvacioglu

Published: 06 Jun 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Event Certifications: reproml.org/MLRC/2023/Journal_Track

Abstract: Graph Neural Networks have recently gained recognition for their performance on graph machine learning tasks. The increasing attention on these models’ trustworthiness and decision-making mechanisms has instilled interest in the exploration of explainability tech- niques, including the model proposed in "GNNInterpreter: A probabilistic generative model- level explanation for Graph Neural Networks." (Wang & Shen (2022)). This work aims to reproduce the findings of the original paper, by investigation the main claims made by its authors, namely that GNNInterpreter (i) generates faithful and realistic explanations with- out requiring domain-specific knowledge, (ii) has the ability to work with various node and edge features, (iii) produces explanations that are representative for the target class and (iv) has a much lower training time compared to XGNN, the current state-of-the-art model- level GNN explanation technique. To reproduce the results, we make use of the open-source implementation and we test the interpreter on the same datasets and GNN models as in the original paper. We conduct an enhanced quantitative and qualitative evaluation, and additionally we extend the original experiments to include another real-world dataset. Our results show that we are not able to validate the first claim, due to significant hyperpa- rameter and seed variation, as well as due to training instability. Furthermore, we partially validate the second claim by testing on datasets with different node and edge features, but we reject the third claim due to GNNInterpreter’s failure to outperform XGNN in producing dataset aligned explanations. Lastly, we are able to confirm the last claim.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - We extended the discussion about GNNInterpreter’s methodology in Section 4, better clarifying notations used in equations as well as adding additional information about the different components of the model. - We further investigated the cause of GNNInterpreter’s instability (see new section 6.4 - Analysis of training instability). Additionally, we drew new insights and guidelines on how to improve the model in the Discussion section. - We added a paragraph (in section 6.1) showing more insight into individual model performance on the MUTAG dataset. It shows that for some lucky seeds the performance does match that of the original paper. - We expanded Section 2 with our main contributions, both directly related to the original paper’s claims as well as our findings that go beyond the original scope. - We edited our conclusion and made our claims less strong, focusing more on the fact that our results support/contradict the claims instead of fully confirming or denying them. - We changed the order and wording surrounding claim 1, to emphasize that we tried out different variations of implementations and did not always directly follow the (unofficial) github implementation. - We added a paragraph in methodology explaining why the github implementation was chosen and how it still allows us to draw conclusions on GNNInterpreter. (Section 5) - We have addressed all small issues/clarity fixes: we included the missing reference to GCExplainer in the literature survey and we fixed the incorrect reference (Gilbert); we added a more detailed explanation of the symbols in Equation 2, we added some more clarification to the Verification Study in section 5.4. The fake motifs are manually created by qualitatively extracting rules from explanation graphs. These motifs include common features with the ground truth motif, but can never be the same motif as the ground truth since they are crafted manually; we fixed the inconsistent wording in Table 3. - We have also conducted additional experiments, namely a comparative analysis on the MUTAG dataset with GNNExplainer and further investigations of the GNNInterpreter’s training instability. We have also provided a more detailed discussion on potential reasons and guidelines for improvement. - We have now conducted additional experiments using GNNExplainer. These can be viewed in section 6.3 - Results beyond original paper.

Code: https://github.com/MeneerTS/GNNinterpreter\_Reproduction

Assigned Action Editor: ~Nadav_Cohen1

Submission Number: 2196

Loading