# Predicting Network Motif Fingerprints with Graph Neural Networks


**Authors:** -.


*Abstract*

Graph Neural Networks (GNNs) are a predominant method for graph representation learning. However, beyond subgraph frequency estimation, their application to network motif prediction remains underexplored, with no established benchmarks in the literature. We propose to address this problem, framing motif estimation as an extension of subgraph frequency estimation. Our approach formulates motif estimation as a multitarget regression problem, optimising for interpretability and improving stability and scalability on large graphs. We validate our method using a large synthetic dataset generated by graph generators that mimic real-world data, and further test it on real-world graphs. Our experiments reveal that 1-WL limited models trained on synthetic data struggle to predict accurately motif profiles of real-world networks. However, apart from their reasonable performance within synthetic data, they can generalise to approximate the graph generation processes of real-world networks by comparing their predicted motif profiles with the ones originating from synthetic data. This first study on GNN-based motif estimation sets a benchmark and should open pathways for further developing the connection between motif profiles and subgraph frequency from a graph representation learning perspective.

---

## [Hephaestus](hephaestus/README.md) - Code for Experiments

The folder [hephaestus](hephaestus) has all the code used to generate the graphs (synthetic and real) and their labels and features. Furthermore, it has all the code to define the models and train them. Follow the README in the said folder to understand how to reproduce each step of the experiments.

## [Hephaestus Lab](hephaestus_lab/README.md) - Code for Analysis

The folder [hephaestus_lab](hephaestus_lab) has all the code used to analyse the results from training the multiple models used. Follow the README in the said folder to understand how to reproduce each step of the analysis made.

## By Sections from the Paper

* Section 5 Datasets: Follow [1 Generating the synthetic and real graphs and their labels](hephaestus/README.md#1-generating-the-synthetic-and-real-graphs-and-their-labels) and [2 Generating PyG datasets](hephaestus/README.md#2-generating-pyg-datasets).
* Appendix A.3, A.4, A.5: Follow [1 Dataset Stats](hephaestus_lab/README.md#1-data-details).
* Section 7 Results and Appendix D: Follow [3 Training the models](hephaestus/README.md#3-training-the-models), [2 Compare the Training Results](hephaestus_lab/README.md#2-compare-the-training-results) and [3 Evaluate and Analyse the Predictions](hephaestus_lab/README.md#3-evaluate-and-analyse-the-predictions).
  * Appendix D.2: Follow [6 Validation of the Assumptions Made](hephaestus_lab/README.md#6-validation-of-the-assumptions-made).
  * Appendix D.3: Follow [3 Evaluate and Analyse the Predictions](hephaestus_lab/README.md#3-evaluate-and-analyse-the-predictions).
  * Section 7.1 Discussion of the Results: Follow [4 Persistent Patterns](hephaestus_lab/README.md#4-persistent-patterns).
  * Section 7.2 Tendency For Persistent Patterns: Follow [5 Dropout Experiments](hephaestus_lab/README.md#5-dropout-experiments).
  * Section 7.3 Model Predictions in the Synthetic Dataset: Same as D.3.
  * Section 7.4 Model Predictions in the Real-World Dataset: Same as D.3.

To see all predictions for the real-world data download [here](https://figshare.com/s/794d3e3dc66ee09c0e86 "Figshare: experiment_results/plots_27-09-2024.zip") (`experiment_results/plots_27-09-2024.zip`) the selected file and navigate to the folder `evaluate_models/CORRECTIONS`.

---

<sub><sup>Why are the folders named hephaestus? I needed a name to start the project and I like greek myths and legends :).</sup></sub>
