
# Recurrent Graph View Transformation (RGVT)

Recurrent Graph View Transformation (RGVT) extends Graph View Transformation (GVT) in the view space for node representation learning in node classification. The view space is a graph-induced representation space that enables unified representation and processing of graphs with heterogeneous feature specifications, allowing knowledge to transfer across arbitrary graph datasets. This repository provides the official implementation for pretraining RGVT on a source dataset (OGBN-Arxiv) and evaluating its transferability on 28 downstream benchmarks using lightweight predictors.

---

## Overview

- **Two-stage pipeline (`main.py`)**
  - **Stage 1 (Pretraining):** Train RGVT on a large-scale source dataset and save checkpoints.  
  - **Stage 2 (Adaptation):** Freeze RGVT and train a lightweight predictor (linear / MLP) for downstream tasks.  

- **Datasets:** Unified loader (`load_dataset.py`) supporting **DGL**, **OGB**, **Yandex heterophilous**, and **PyG** benchmarks.  

- **Utilities:** Training loops, result summarization, and optional **Weights & Biases** logging.  

---

## Scripts

To reproduce the principal experimental settings:

- `scripts/both_rgvt_mlp.sh` – pretraining followed by adaptation with an MLP predictor.  
- `scripts/adapt_rgvt_mlp.sh` – adaptation only (requires checkpoint at `checkpoints/mlp/seed_42.pth`).  

---

## Outputs

- **Pretraining:** RGVT checkpoints → `checkpoints/`  
- **Adaptation:** Evaluation reports per dataset → `results/<timestamp>/`  

---

## Installation

```bash
pip install torch dgl torch-geometric ogb scikit-learn numpy matplotlib tqdm wandb
```

> ⚠️ Ensure the installed **DGL build** matches your **CUDA version**.

----------

## **Repository Structure**

-   main.py – entry point coordinating pretraining, adaptation, and logging.
    
-   stages/ – encapsulated stage logic:
    
    -   stage1.py: RGVT pretraining
        
    -   stage2.py: multi-dataset adaptation
        
    
-   model/ – RGVT modules (rgvt.py, gvt.py) and downstream predictors.
    
-   utils/ – helpers for seeding, training, evaluation, and WandB integration.
    
-   load_dataset.py – dataset registry, download utilities, split handling, feature statistics.
    
-   scripts/ – example scripts.
    

----------

## **Quick Start**

  

### **Full pipeline with MLP predictor**

```
python main.py --mode both --predictor_type mlp
```

### **Pretrain RGVT (example: OGBN-Arxiv)**

```
python main.py --mode pretrain \
  --datasetA 27_ogbn_arxiv \
  --learning_rate 0.005 \
  --predictor_type mlp
```

### **Adapt pretrained RGVT with MLP predictor**

```
python main.py --mode adaptation \
  --checkpoint checkpoints/mlp/seed_42.pth \
  --predictor_type mlp \
  --learning_rate 0.005
```

