# FILLING IN THE GAP: ACHIEVING ROBUST AND ADAPTIVE GNNS THROUGH POST-PROCESSING

We introduce FILLER (Framework for Integrating Layer-Level Edge-shift Recovery), a general post-processing method that can enhance any trained GNN to be robust against edge distribution shift. The project is configurable and supports multiple architectures and datasets, with the ability to repeat experiments and log the results in an organized manner.

The key idea of FILLER is to inject an Edge-shift Recovery (ER) layer into each GNN layer, addressing the representation gap caused by the edge distribution shift, thereby restoring the model’s original performance on test-time graphs. FILLER updates the ER layer through iterations, gradually improving the edge robustness of GNNs.

## Features

- Supports multiple GNN architectures: `GCN`, `GraphSAGE`, `SGC`, `GAT`, and `GIN`.
- Compatible with various datasets: `Cora`, `CiteSeer`, `PubMed`, `Computers`, `Photo`, `CS`, `Physics`, `ogbn-arxiv`, `Reddit`, and `Flickr`.
- Provides post-processing techniques to improve the models' robustness to edge distribution shift.
- Docker support for easy setup.
- Experiment result logging with various visualizations (`restore.jpg`, `remove.jpg`).

## Table of Contents

1. [Installation](#installation)
2. [Docker Setup](#docker-setup)
3. [Usage](#usage)
4. [Results Format](#results-format)
5. [Folder Structure](#folder-structure)

## Installation

### Prerequisites

- Python 3.8+
- CUDA-compatible GPU (optional for running on GPU)
- [PyTorch](https://pytorch.org/get-started/locally/) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)

## Docker Setup

A `Dockerfile` is provided for containerized execution. To use Docker:

### Build the Docker Image

```bash
docker build -t gnn-post-process .
```

### Run the Container

```bash
docker run --gpus all -v $(pwd):/app -it gnn-post-process bash
```

This will launch a container with all the necessary dependencies installed, and you can execute the training and post-processing scripts from within the container.

## Usage

### Running Evaluation

You can use the provided `run_exp.sh` script to run experiment of our framework. Ensure that it is executable:

```bash
chmod +x run_exp.sh
./run_exp.sh
```

### Command-line Interface

You can run the framework using the `main.py` script. Use the following arguments to configure your run:

```bash
python main.py --architecture <gcn|graphsage|sgc|gat|gin> --dataset <cora|citeseer|pubmed|...> --step <train|post_process|plot|all>
```

#### Key Arguments:

- `--architecture`: The GNN architecture to use (e.g., GCN, GraphSAGE, GAT, etc.).
- `--dataset`: The dataset to train and evaluate on.
- `--edge_ratio`: The proportion of edges used in training (default: 1.0).
- `--random_seed`: The random seed for reproducibility.
- `--repeat_num`: The number of experiment repetitions (default: 5).
- `--process_num`: The number of post-processing iterations (default: 5).
- `--pp_method`: The post-processing method to use (`simple` or `advanced`).
- `--step`: The step to execute (`train`, `post_process`, `plot`, or `all`).


## Results Format

After running experiments, results are saved in a structured format as follows:

```
model_pth/
    {architecture}/
        {dataset}/
            {edge_ratio}/
                {random_seed}/
                    base_{train_num}.pth
                    pp_{pp_method}/
                        processed_{train_num}_{processed_num}.pth

results/
    {architecture}/
        {dataset}/
            {edge_ratio}/
                {random_seed}/
                    pp_{pp_method}/
                        restore.jpg
                        remove.jpg
                        result.json
                        results_mean_std.csv
```

- **restore.jpg**: Visualization of edge addition(restoration) experiments.
- **remove.jpg**: Visualization of edge removal experiments.
- **result.json**: The experiment's raw results in JSON format.
- **results_mean_std.csv**: A summary of results with mean and standard deviation.

## Folder Structure

- `train/`: Contains the training scripts for different datasets and batch processing methods.
- `data/`: Handles dataset loading.
- `post_process/`: Implements the post-processing methods.
- `experiment/`: Contains experiment-related scripts, including plotting functions.
- `model/`: Includes model utilities like saving and cloning.
- `results/`: Stores the experiment results in a structured format.