- Keywords: Graph Neural Network, Explainer, classification
- Abstract: Scope of Reproducibility The main claims that are reproduced in this report are: The PGExplainer is able to correctly identify the ground-truth motif responsible for node and graph classification of a given GNN. The PGExplainer is able to achieve a maximum AUC of 0.987 for node classification and a maximum AUC of 0.926 for graph classification, both with a standard deviation that is a maximum of 0.021$ The PGExplainer is able to generate explanations for the given node classification tasks in 24 milliseconds or less, and graph classification tasks in 80 milliseconds or less. Methodology The provide codebase, which had a TensorFlow implementation, from the original PGExplainer paper has been used to reproduce their experiments. To replicate their work, the codebase has also been reimplemented to a PyTorch framework. All datasets are tested 10 times to find the average AUC and inference time. Results The TensorFlow implementation is able to find and show the correct motifs for all the tested datasets. The PyTorch implementation is able to do the same, except for the MUTAG dataset. The AUC for node classification is higher than stated in the paper for the TensorFlow implementation, the graph classification AUC is mostly similar. The inference time that was found using the PyTorch implementation seems to be in the same ballpark as the results shown in the original study. What was easy The paper was well written, which made it easy to understand the concepts and techniques that were used. On top of that, the models were precisely described and in great detail, this made the implementation of the models much easier. What was difficult Even though the reimplementation of TensorFlow into PyTorch was not a big obstacle, the rest of the code was not very structured or well written. A number of inconsistencies were found between the code and the paper, mostly in mentioned hyperparameters. Next to that, the provided code did not support GPU processing out of the box. The last dataset that was used in the original study, the MUTAG dataset, was very big, resulting in some computational problems. Even though the computational problems were managed eventually, the model could not be tested properly on this dataset due to its size. Communication with original authors No contact has been made with the original authors of the paper.
- Paper Url: https://openreview.net/forum?id=WsphwsV5hV¬eId=eGNpudm8y-N
- Supplementary Material: zip