# Setup

## Update and Install Required Packages
```bash
apt-get update && \
        apt-get install -y libopenmpi-dev curl wget vim watch procps ncdu tree unzip && \
        apt-get clean && rm -rf /var/lib/apt/lists/*
```

## Download and install Miniforge

```bash
curl -fsSL https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -o miniforge.sh && \
bash miniforge.sh -b -p /opt/conda && \
rm miniforge.sh
```

## Initialize Conda
```bash
. /opt/conda/etc/profile.d/conda.sh
. /opt/conda/etc/profile.d/mamba.sh
/opt/conda/bin/conda init bash
```


## Create a new Conda environment
```bash
mamba env create --quiet --file env.yml
```
    
## Activate the environment
```bash
conda activate megagnn
```

## Install GenAgg
```bash
cd genagg 
pip install -e 
```

# Data
The data needed for the experiments can be found on [Kaggle](https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml/data). To use this data with the provided training scripts, you first need to perform a pre-processing step for the downloaded transaction files (e.g. `HI-Small_Trans.csv`):
```
python format_kaggle_files.py /path/to/kaggle-files/HI-Small_Trans.csv
```
Make sure to change the filepaths in the `data_config.json` file. The `aml_data` path should be changed to wherever you stored the `formatted_transactions.csv` file generated by the pre-processing step.

The data for node classification experiments can be found on [ETH-Kaggle](https://www.kaggle.com/datasets/xblock/ethereum-phishing-transaction-network).
- Networkx graph structure needs to be pre-processsed. Graph should be saved in two seperate `.csv` files (nodes.csv and edges.csv). Edges are transactions, they will hold informations [fromm_address, to_address, value ,timestamp] and nodes are arrange such that each node has [address, first_transaction] where first_transaction is the first timestamp of a node recevied/make transaction.


# Experiment Commands

## AML Datasets
`For different datasets change the dataset name, and run same commands.`
- MEGA-GIN
```bash
python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --flatten_edges --edge_agg_type gin --n_epochs 80 --save_model --task edge_class
```
- MEGA-PNA
```bash
python main.py --data Small_HI --model pna --emlps --reverse_mp --ego --flatten_edges --edge_agg_type pna --n_epochs 80 --save_model --task edge_class
```

- MEGA-GenAgg
```bash
python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --flatten_edges --edge_agg_type gin --node_agg_type genagg --n_epochs 80 --save_model --task edge_class
```

| Different combinations of aggregation functions can be selected. For example;
- MEGA(GenAgg)-GIN
```bash
python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --flatten_edges --edge_agg_type genagg --n_epochs 80 --save_model --task edge_class
```

## ETH Dataset
- MEGA-GIN
```bash
python main.py --data ETH-Kaggle --model gin --emlps --ego --reverse_mp --flatten_edges --edge_agg_type gin --task node_class --batch_size 4096 --n_epochs 80
```
- MEGA-PNA
```bash
python main.py --data ETH-Kaggle --model pna --emlps --ego --reverse_mp --flatten_edges --edge_agg_type pna --task node_class --batch_size 4096 --n_epochs 80
```
- MEGA-GenAgg
```bash
python main.py --data ETH-Kaggle --model gin --emlps --ego --reverse_mp --flatten_edges --edge_agg_type genagg --node_agg_type genagg --task node_class --batch_size 4096 --n_epochs 80
```

- For baseline ADAMM results
- ADAMM-GIN
```bash
python main.py --data ETH-Kaggle --model gin --emlps --ego --flatten_edges --edge_agg_type adamm --task node_class --batch_size 4096 --n_epochs 80
```
- ADAMM-PNA
```bash
python main.py --data ETH-Kaggle --model pna --emlps --ego --flatten_edges --edge_agg_type adamm --task node_class --batch_size 4096 --n_epochs 80
```
- ADAMM-GenAgg
```bash
python main.py --data ETH-Kaggle --model gin --node_agg_type genagg --emlps --ego --flatten_edges --edge_agg_type adamm --task node_class --batch_size 4096 --n_epochs 80
```