# MegaGNN

## Setup

- Create a new Conda environment, and activate.
```bash
mamba env create -f env.yml
mamba activate megagnn

```
- Install Pytorch and Pytorch Geometric
```bash
mamba install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
mamba install pyg -c pyg
pip install -r requirements.txt
```
- Lastly, install genagg
```bash
cd genagg 
pip install -e .
```

## Datasets

The data needed for the experiments can be found on [Kaggle](https://www.kaggle.com/datasets/ealtman2019 ibm-transactions-for-anti-money-laundering-aml/data).

The data for node classification experiments can be found on [ETH-Kaggle](https://drive.google.com/drive/folders/1u-NZ96U1SObxXEdWuClOufInbv_5vB1g?usp=sharing).

The data should be organized as follows:

```
/path/to/data/
└──AML/
   ├── Medium-HI
   │   ├── processed
   |   |     .
   |   |     .
   │   └── raw
   │       ├── HI-Medium_Patterns.txt
   │       └── HI-Medium_Trans.csv
   ├── Small-HI
   |   ├── processed
   |   └── raw
   |       ├── HI-Small_Patterns.txt
   |       └── HI-Small_Trans.csv
   |
   ETH
   ├── Kaggle
       ├── processed
       └── raw
           ├── edges-kaggle.csv
           └── nodes-kaggle.csv
```

The `processed` directories will be automatically created by the dataloader. You only need to provide the raw data files following the structure above. The dataloader will process these files and create:


Make sure to update the data directory path in your configuration files:

```yaml
dataset:
  dir: /path/to/data
  format: AML     # Choose which benchmark to use
  name: Small-HI  # Choose which dataset to use
```

## Usage

### Configuration

MegaGNN uses YAML configuration files located in the `configs/` directory. These files define model architecture, training parameters, and dataset settings.

Example configuration:

```yaml
model:
  type: MegaGNNModel
  loss_fun: weighted_cross_entropy
  loss_fun_weight: [1, 7]
gnn:
  layer_type: PNA  # GINE, GenAgg, PNA, RGCN, RGCNE
  act: relu
  dropout: 0.08
  layers_mp: 2
  dim_inner: 20
  edge_updates: True
  multi_edge_agg: True
  multi_edge_agg_type: pna
```

### Training

To train a model, run directly with Python:

```bash
python -m MegaGNN.main --cfg configs/GIN/AML-Small-HI-GIN.yaml
```

### Custom Configurations

To create a custom configuration:

1. Copy an existing configuration file from `configs/`
2. Modify parameters as needed
3. Run with your new configuration file


## Key Components

- `MegaGNN/main.py`: Entry point for training and evaluation
- `MegaGNN/network/megagnn.py`: Core GNN model implementation
- `MegaGNN/datasets/`: Dataset implementations for various data sources
- `MegaGNN/layer/`: Custom GNN layer implementations
- `configs/`: Configuration files for different model variants and datasets

