# Downloading Datasets
The datasets `twitter-trolls` and `citations` are part of the supplementary materials. The two version of the `makg` dataset can be downloaded using the following links:
- MAKG (small):
https://lpg2vec-datasets.s3.eu-central-1.amazonaws.com/makg_small.csv
- MAKG (large):
https://lpg2vec-datasets.s3.eu-central-1.amazonaws.com/makg_large.csv

# Preprocessing
- Place the raw dataset in `raw_data/` and name it `<datasetname>.csv`
- Make sure that the column containing the vertex-id is called `entity_id`. If this is not the case, simply change the respective column name in `raw_data/<datasetname>.csv`
- To preprocess the data, run `python3 preprocess_dataaset.py <datasetname>`. The preprocessing script will read the raw data, filter out invalid rows and store two files called `<datasetname_vertices.csv` and `dataset_edges.csv` under `GraphGymPyG/datasets/<datasetname>/raw/`.
  - Exception: To preprocess the dataset `MAKG`, run `preprocess_makg_dataset.py` instead of `preprocess_dataset.py`. This special preprocessing function will use `hasdiscipline`-edges in order to create an additional property called `field` for all vertices labeled with `paper`.
 
# Analysis
 There are two analysis scripts to analyze and better understand datasets. These analysis scripts read the preprocessed data from `GraphGymPyG/datasets/<datasetname>/raw/`, hence, analysis can only be preformed after preprocessing.
- Run `python3 analyze_dataset_text.py <datasetname>` to create a description of the dataset in text-form. The results are printed to the console. 
- Run `python3 analyze_dataset_plots.py <datasetname>` to create plots depicting an analysis of the dataset. The plots are stored as `analysis/analysis_<datasetname>_vertices.pdf` and `analysis/analysis_<datasetname>_edges.pdf`

# Configuring Experiments
To configure an experiment (node- or edge-level task, regression or classification, prediction target, which labels and properties for nodes and edges should be included in the respective feature vectors, model, etc.), edit the file `GraphGymPyG/configs/<datasetname>.yaml`.

# Running Experiments
To run an experiment, execute `cd GraphGymPyG` and `bash run_<datasetname>.sh`. The results will be saved in `GraphGymPyG/results/<datasetname>`
