# HeaRT

Official code for the NeurIPS'23 paper ["Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking"](https://arxiv.org/pdf/2306.10453.pdf), and ICLR'24 paper ["Revisiting Link Prediction: A Data Perspective"](https://arxiv.org/pdf/2310.00793.pdf).


## Installation

Please see the [installation.md](./installation.md) for how to install the proper requirements. 


## Download Data

All data can be downloaded by running the `download_data.sh` script:
```
cd HeaRT  # Must be in the root directory
bash download_data.sh
``` 
This includes the negative samples generated by HeaRT and the splits for Cora, Citeseer, and Pubmed. The data for the OGB datasets will be automatically downloaded from the `ogb` package.

## Reproduce Results

The commands needed to reproduce all the results with the appropriate hyperparameters can be found in the **`scripts/hyparameters`** directory. We include a file for each dataset which includes the command to train and evaluate each possible method.

For example, to reproduce the results on ogbl-collab under the existing evaluation setting, the command for each method can be found in the `ogbl-collab.sh` file located in the `scripts/hyperparameter/existing_setting_ogb/` directory. 

To run the code, we need to first go to the appropriate setting directory. This includes:
- `benchmarking/exist_setting_small`: Run models on Cora, Citeseer, and Pubmed under the **existing setting**.
- `benchmarking/exist_setting_ogb`: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under the **existing setting**.
- `benchmarking/exist_setting_ddi`: Run models on on ogbl-ddi under the **existing setting**.
- `benchmarking/HeaRT_small`: Run models on Cora, Citeseer, and Pubmed under **HeaRT**.
- `benchmarking/HeaRT_ogb`: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under **HeaRT**.
- `benchmarking/HeaRT_ddi/`: Run models on ogbl-ddi under **HeaRT**.

Below we give examples of running GCN on the different groups of datasets under both settings:

Cora under the **existing setting**.
```
cd benchmarking/exist_setting_small/
python  main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN --lr 0.01 --dropout 0.3 --l2 1e-4 --num_layers 1  --num_layers_predictor 3 --hidden_channels 128 --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024
```

ogbl-collab under the **existing setting** (similar for ogbl-ppa and ogbl-citation2):
```
cd benchmarking/exist_setting_ogb/
python main_gnn_ogb.py  --use_valedges_as_input  --data_name ogbl-collab  --gnn_model GCN --hidden_channels 256 --lr 0.001 --dropout 0.  --num_layers 3 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100  --batch_size 65536 
```

ogbl-ddi under the **existing setting**:
```
cd benchmarking/exist_setting_ddi/
python main_gnn_ddi.py --data_name ogbl-ddi --gnn_model GCN  --lr 0.01 --dropout 0.5  --num_layers 3 --num_layers_predictor 3  --hidden_channels 256 --epochs 9999 --eval_steps 1 --kill_cnt 100 --batch_size 65536 
```

Cora/Citeseer/Pubmed under **HeaRT**:
```
cd benchmarking/HeaRT_small/
python main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN  --lr 0.001 --dropout 0.5 --l2 0 --num_layers 1 --hidden_channels 256  --num_layers_predictor 3  --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024 
```

ogbl-collab under **HeaRT** (similar for ogbl-ppa and ogbl-citation2):
```
cd benchmarking/HeaRT_ogb/
python main_gnn_ogb.py  --data_name ogbl-collab  --use_valedges_as_input --gnn_model GCN  --lr 0.001 --dropout 0.3 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536  
```

ogbl-ddi under **HeaRT**:
```
cd benchmarking/HeaRT_ddi/
python main_gnn_ddi.py  --data_name ogbl-ddi   --gnn_model GCN --lr 0.01 --dropout 0 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536    
```



## Generate Negative Samples using HeaRT

The set of negative samples generated by HeaRT, that were used in the study, can be reproduced via the scripts in the `scripts/HeaRT/` directory. 

A custom set of negative samples can be produced by running the `heart_negatives/create_heart_negatives.py` script. Multiple options exist to customize the negative samples. This includes:
- The CN metric used. Can be either `CN` or `RA` (default is `RA`). Specified via the `--cn-metric` argument.
- The aggregation function used. Can be either `min` or `mean` (default is `min`). Specified via the `--agg` argument.
- The number of negatives generated per positive sample. Specified via the `--num-samples` argument (default is 500).
- The PPR parameters. This includes the tolerance used for approximating the PPR (`--eps` argument) and the teleporation probability (`--alpha` argument). `alpha` is fixed at 0.15 for all datasets. For the tolerance, `eps`, we recommend following the settings found in `scripts/HeaRT`.


## Updates

**November 3rd, 2023**
* Modified the negative samples for ogbl-collab to **allow** train/valid positive samples to be negatives. Please see Appendix I in the paper for our rationale. 

**Feb 17th, 2024**
* Uploaded the implementation for the decoupled SEAL in the ICLR 2024 paper ["Revisiting Link Prediction: A Data Perspective"](https://arxiv.org/pdf/2310.00793.pdf). The commands are available in the **`scripts/hyparameters`** under the existing setting.

## Cite
```
@inproceedings{
  li2023evaluating,
  title={Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking},
  author={Li, Juanhui and Shomer, Harry and Mao, Haitao and Zeng, Shenglai and Ma, Yao and Shah, Neil and Tang, Jiliang and Yin, Dawei},
  booktitle={Neural Information Processing Systems {NeurIPS}, Datasets and Benchmarks Track},
  year={2023}
}
```
```
@article{mao2023revisiting,
  title={Revisiting link prediction: A data perspective},
  author={Mao, Haitao and Li, Juanhui and Shomer, Harry and Li, Bingheng and Fan, Wenqi and Ma, Yao and Zhao, Tong and Shah, Neil and Tang, Jiliang},
  journal={The Twelfth International Conference on Learning Representations},
  year={2024}
}
```
