# DRIK: Distribution-Robust Inductive Kriging without Information Leakage

This repository contains the implementation of "DRIK: Distribution-Robust Inductive Kriging without Information Leakage", a deep learning framework for spatio-temporal data imputation and missing value completion, with a focus on graph neural network approaches for spatio-temporal data processing.

> **Abstract**: *Inductive kriging supports high-resolution spatio-temporal estimation with sparse sensor networks, but conventional training–evaluation setups often suffer from information leakage and poor out-of-distribution (OOD) generalization. We find that the common 2×2 spatio-temporal split allows test data to influence model selection through early stopping, obscuring the true OOD characteristics of inductive kriging. To address this issue, we propose a 3×3 partition that cleanly separates training, validation, and test sets, eliminating leakage and better reflecting real-world applications. Building on this redefined setting, we introduce DRIK, a Distribution-Robust Inductive Kriging approach designed with the intrinsic properties of inductive kriging in mind to explicitly enhance OOD generalization, employing a three-tier strategy at the node, edge, and subgraph levels. DRIK perturbs node coordinates to capture continuous spatial relationships, drops edges to reduce ambiguity in information flow and increase topological diversity, and adds pseudo-labeled subgraphs to strengthen domain generalization. Experiments on six diverse spatio-temporal datasets show that DRIK consistently outperforms existing methods, achieving up to 12.48% lower MAE while maintaining strong scalability.*

## Dependencies

- Python 3.8
- PyTorch 1.8.1
- PyTorch Lightning 1.4.0
- CUDA 11.3
```
> conda env create -f env_{ubuntu,windows}.yaml
```

## Datasets

We utilize 6 datasets from different domains:
- Traffic speed datasets:
    - METR-LA
    - PEMS-BAY
 
- Solar power datasets:
    - NREL-AL
    - NREL-MD
  
- Air quality datasets:
    - AQI36
    - AQI

These datasets could be downloaded from this [datasets.zip](https://drive.google.com/file/d/1VQrSLNAr3qr2LAsEK1-_CbBbu6vr0G63/view?usp=sharing), and compressed to the current path.

## Usage

- **Training**:
    - E.g., train DRIK on METR-LA dataset:
      ```
      python train.py --config config/drik/la_point.yaml
      ```

- **Testing**:
    - E.g., test DRIK with pretrained model:
      ```
      python train.py --config config/drik/la_point.yaml --pretrained-model path/to/checkpoint.ckpt
      ```

## References

This repo is mainly built based on [KITS](https://github.com/Sam1224/KITS). Thanks for their great work!
