
# Recurrent Graph View Transformation (RGVT)

**Recurrent Graph View Transformation (RGVT)** extends **Graph View Transformation (GVT)** to encode *universal graph knowledge* in the **view space**, enabling transfer across graphs with arbitrary feature specifications.  
This repository provides the official implementation for pretraining RGVT on a source dataset and adapting it to **28 downstream benchmarks** via lightweight predictors.

---

## Overview

- **Two-stage pipeline (`main.py`)**
  - **Stage 1 (Pretraining):** Train RGVT on a large-scale source dataset and save checkpoints.  
  - **Stage 2 (Adaptation):** Freeze RGVT and train a lightweight predictor (linear / MLP) for downstream tasks.  

- **Datasets:** Unified loader (`load_dataset.py`) supporting **DGL**, **OGB**, **Yandex heterophilous**, and **PyG** benchmarks.  

- **Utilities:** Training loops, result summarization, and optional **Weights & Biases** logging.  

---

## Scripts

To reproduce the principal experimental settings:

- `scripts/both_rgvt_mlp.sh` – pretraining followed by adaptation with an MLP predictor.  
- `scripts/adapt_rgvt_mlp.sh` – adaptation only (requires checkpoint at `checkpoints/mlp/seed_42.pth`).  

---

## Outputs

- **Pretraining:** RGVT checkpoints → `checkpoints/`  
- **Adaptation:** Evaluation reports per dataset → `results/<timestamp>/`  

---

## Installation

```bash
pip install torch dgl torch-geometric ogb scikit-learn numpy matplotlib tqdm wandb
```

> ⚠️ Ensure the installed **DGL build** matches your **CUDA version**.

----------

## **Repository Structure**

-   main.py – entry point coordinating pretraining, adaptation, and logging.
    
-   stages/ – encapsulated stage logic:
    
    -   stage1.py: RGVT pretraining
        
    -   stage2.py: multi-dataset adaptation
        
    
-   model/ – RGVT modules (rgvt.py, gvt.py) and downstream predictors.
    
-   utils/ – helpers for seeding, training, evaluation, and WandB integration.
    
-   load_dataset.py – dataset registry, download utilities, split handling, feature statistics.
    
-   scripts/ – experiment configurations.
    
-   checkpoints/, results/, splits/, wandb_logs/ – default directories for outputs.
    

----------

## **Quick Start**

  

### **Full pipeline with MLP predictor**

```
python main.py --mode both --predictor_type mlp
```

### **Pretrain RGVT (example: OGBN-Arxiv)**

```
python main.py --mode pretrain \
  --datasetA 27_ogbn_arxiv \
  --learning_rate 0.005 \
  --predictor_type mlp
```

### **Adapt pretrained RGVT with MLP predictor**

```
python main.py --mode adaptation \
  --checkpoint checkpoints/mlp/seed_42.pth \
  --predictor_type mlp \
  --learning_rate 0.005
```

