# Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

## Introduction

This is the reference PyTorch implementation of the paper:\
*Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks*.


## Authors
*Currently Anonymous*

## Requirements
* `python >= 3.7`, `PyTorch >= 1.4`, please refer to their official websites for installation details.
* Other dependencies:
```{bash}
pandas==0.24.2
tqdm==4.41.1
numpy==1.16.4
scikit_learn==0.22.1
matploblib==3.3.1
```
Refer to `environment.yml` for more details.


## Dataset and preprocessing
#### Option 1: Use our preprocessed data
We provide preprocessed datasets: Reddit, Wikipedia, Enron, Social Evolution, and UCI. Download them from [here](https://drive.google.com/drive/folders/1umS1m1YbOM10QOyVbGwtXrsiK3uTD7xQ?usp=sharing) to `processed/` 
```{bash}
cd processed/
unzip data.zip
```

#### Option 2: Preprocess the data using `process.py`
First, download the public data to `processed/`, for example:
* [Reddit](http://snap.stanford.edu/jodie/reddit.csv)

* [Wikipedia](http://snap.stanford.edu/jodie/wikipedia.csv)

next, run the following command:
```{bash}
python process.py --dataset <dataset>
```

We use the dense `npy` format to save the features in binary format. If edge features or nodes features are absent, it will be replaced by a vector of zeros. 


#### Option 3: Use your own data
Put your data under `processed` folder. The required input data includes `ml_${DATA_NAME}.csv`, `ml_${DATA_NAME}.npy` and `ml_${DATA_NAME}_node.npy`. They store the edge linkages, edge features and node features respectively. 

The `CSV` file has following columns
```
u, i, ts, label, idx
```
, which represents source node index, target node index, time stamp, edge label and the edge index. 

`ml_${DATA_NAME}.npy` has shape of [#temporal edges + 1, edge features dimention]. Similarly, `ml_${DATA_NAME}_node.npy` has shape of [#nodes + 1, node features dimension].


All node index starts from `1`. The zero index is reserved for `null` during padding operations. So the maximum of node index equals to the total number of nodes. Similarly, maxinum of edge index equals to the total number of temporal edges. The padding embeddings or the null embeddings is a vector of zeros.


## Training Command

#### Examples:

* To train **CAW-N-mean** with Wikipedia dataset in inductive training, sampling 64 length-2 CAWs every node, and with alpha = 1e-5:
```bash
python main.py -d wikipedia --pos_dim 108 --agg walk --bs 32 --n_degree 64 1 --mode i --bias 1e-5 --pos_enc lp --walk_pool sum --seed 0
```

* To train **CAW-N-attn** with UCI dataset in transductive mode, sampling 32 length-1 CAWs every node, with alpha = 1e-6, and using another random seed 123:
```bash
python main.py -d uci --pos_dim 100 --agg walk --bs 32 --n_degree 32 --n_layer 1 --mode t --bias 1e-6 --pos_enc lp --walk_pool attn --seed 123
```

Detailed logs can be found in `log/`, a one-line summary of the evaluation result will also be written to `log/oneline_summary.log` upon completion.
 
## Usage Summary
```txt
usage: Interface for Inductive Dynamic Representation Learning for Link Prediction on Temporal Graphs
       [-h] [-d {wikipedia,reddit,socialevolve,socialevolve_1month,socialevolve_2weeks,enron,uci}] [-m {t,i}]
       [--n_degree [N_DEGREE [N_DEGREE ...]]] [--n_layer N_LAYER] [--bias BIAS] [--agg {tree,walk}] [--pos_enc {spd,lp}]
       [--pos_dim POS_DIM] [--walk_pool {attn,sum}] [--walk_n_head WALK_N_HEAD] [--walk_mutual]
       [--attn_agg_method {attn,lstm,mean}] [--attn_mode {prod,map}] [--attn_n_head ATTN_N_HEAD] [--time {time,pos,empty}]
       [--n_epoch N_EPOCH] [--bs BS] [--lr LR] [--drop_out DROP_OUT] [--tolerance TOLERANCE] [--ngh_cache] [--gpu GPU]
       [--cpu_cores CPU_CORES] [--verbosity VERBOSITY] [--seed SEED]
```

## Optional arguments
```{txt}
  -h, --help            show this help message and exit
  -d {wikipedia,reddit,socialevolve,socialevolve_1month,socialevolve_2weeks,enron,uci}, --data {wikipedia,reddit,socialevolve,socialevolve_1month,socialevolve_2weeks,enron,uci}
                        data sources to use, try wikipedia or reddit
  -m {t,i}, --mode {t,i}
                        transductive (t) or inductive (i)
  --n_degree [N_DEGREE [N_DEGREE ...]]
                        a list of neighbor sampling numbers for different hops, when only a single element is input n_layer
                        will be activated
  --n_layer N_LAYER     number of network layers
  --bias BIAS           the hyperparameter alpha controlling sampling preference with time closeness, default to 0 which is
                        uniform sampling
  --agg {tree,walk}     tree based hierarchical aggregation or walk-based flat lstm aggregation
  --pos_enc {spd,lp}    way to encode distances, shortest-path distance or landing probabilities
  --pos_dim POS_DIM     dimension of the positional embedding
  --walk_pool {attn,sum}
                        how to pool the encoded walks, using attention or simple sum, if sum will overwrite all the other
                        walk_ arguments
  --walk_n_head WALK_N_HEAD
                        number of heads to use for walk attention
  --walk_mutual         whether to do mutual query for source and target node random walks
  --attn_agg_method {attn,lstm,mean}
                        local aggregation method, we only use the default here
  --attn_mode {prod,map}
                        use dot product attention or mapping based, we only use the default here
  --attn_n_head ATTN_N_HEAD
                        number of heads used in tree-shaped attention layer, we only use the default here
  --time {time,pos,empty}
                        how to use time information, we only use the default here
  --n_epoch N_EPOCH     number of epochs
  --bs BS               batch_size
  --lr LR               learning rate
  --drop_out DROP_OUT   dropout probability for all dropout layers
  --tolerance TOLERANCE
                        toleratd margainal improvement for early stopper
  --ngh_cache           (currently not suggested due to overwhelming memory consumption) cache temporal neighbors previously
                        calculated to speed up repeated lookup
  --gpu GPU             which gpu to use
  --cpu_cores CPU_CORES
                        number of cpu_cores used for position encoding
  --verbosity VERBOSITY
                        verbosity of the program output
  --seed SEED           random seed for all randomized algorithms
```

## Acknowledgement
Our implmentation makes extensive modifications based on the pipeline used [here](https://drive.google.com/drive/folders/1GaH8vusCXJj4ucayfO-PyHpnNsJRkB78). We thank the authors for sharing their code.

## Cite us
```text
@inproceedings{
anonymous2021inductive,
title={Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks},
author={Anonymous},
booktitle={Submitted to International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=KYPz4YsCPj},
note={under review}
}
```

