[Re] $\mathcal{G}$-Mixup: Graph Data Augmentation for Graph ClassificationDownload PDF

Published: 02 Aug 2023, Last Modified: 02 Aug 2023MLRC 2022Readers: Everyone
Keywords: rescience c, machine learning, data augmentation, graph neural networks, pytorch geometric, mixup
TL;DR: We provide a reproducibility report of G-Mixup, novel graph data augmentation method.
Abstract: Scope of Reproducibility This paper presents a novel augmentation method that can be used for graph classification tasks: $\mathcal{G}$-Mixup. Our goal is to reproduce eight claims that the authors make in their paper. The first two claims relate to the properties of graphons estimated from graphs, which are the main components of the method. Claims three to eight relate to the superior performance of the method compared to other augmentation strategies. Methodology To reproduce the results, we use the open-source implementation of the method provided by the authors, with a few modifications. We write from scratch all the experiments and pipelines needed to defend the claims of the paper. Additionally, we implement three out of four baseline augmentation methods that are compared to the novel method. For one part of the experiments, we use a local computer and run the experiments on a CPU, with a total of 31.7 CPU hours, while for other more demanding experiments, we use a GPU-accelerated machine for a total of 157.3 GPU hours. Results Due to many missing implementation details, we were not able to reproduce all of the original results. Some claims can be supported by our results, but most results are very vague. Even though the new method outperforms the baselines in certain scenarios, we find that the superiority of the method is not as strong as presented in the original paper. What was easy The novel augmentation method and its theoretical justification are presented intuitively in the paper, and it was easy to grasp the main ideas of the paper. What was difficult While the code with the method implementation was given, the reproduction of all the results required much more code and details than what was provided in the paper. This means that we had to make a lot of educated guesses about the experimental settings, choices of hyperparameters, and details about the models used. Communication with original authors We contacted the authors on two occasions. The first time was very early in the reproduction process, when we asked about the details of the graph estimation methods that they used and which weren't explained in the paper, to which they responded swiftly. On the second occasion, we asked about the experimental settings and hyperparameter details, but we did not receive a reply.
Paper Url: https://proceedings.mlr.press/v162/han22c.html
Paper Review Url: https://openreview.net/forum?id=dIVrWHP9_1i
Paper Venue: ICML 2022
Confirmation: The report pdf is generated from the provided camera ready Google Colab script, The report metadata is verified from the camera ready Google Colab script, The report contains correct author information., The report contains link to code and SWH metadata., The report follows the ReScience latex style guides as in the Reproducibility Report Template (https://paperswithcode.com/rc2022/registration)., The report contains the Reproducibility Summary in the first page., The latex .zip file is verified from the camera ready Google Colab script
Latex: zip
Journal: ReScience Volume 9 Issue 2 Article 1
Doi: https://www.doi.org/10.5281/zenodo.8173650
Code: https://archive.softwareheritage.org/swh:1:dir:b3da6c4106727ef884219a1d04275a3306672c34
0 Replies