
#  VCAUSE

Code repository for the paper "Variational Causal Autoencoder for Interventionaland Counterfactual Queries" (VCAUSE). 
The implementation is based on [Pytorch](https://pytorch.org/), 
 [Pytorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/) and 
 [Pytorch Lightning](https://www.pytorchlightning.ai/). The repository contains the necessary resources to run the 
experiments of the paper. Follow the instructions below to download the German dataset.

## Installation
Create conda environment and activate it:

```
conda create --name vcause python=3.7.9 --no-default-packages
conda activate vcause 
```

Install PyTorch:

```
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cpuonly -c pytorch
```

Install PyTorch Geometric (in this order):
```
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-geometric
```



Finally, install the following:

```
pip install pytorch-lightning --quiet
pip install -r requirements.txt
```
**Note**: The German dataset is not contained in this repository. The first time you try to train on the German dataset, 
you will get an error with instructions on how to download and store it. Please follow the instructions, 
such that the code runs smoothly.


## Run our model

Everything needed to run the experiments shown in the paper is contained in the script main.py.
Running the algorithm is quite straightforward, a command line interface is provided and help is provided through

```
python main.py --help
```

To train our standard VCAUSE algorithm on the SCM triangle with linear equations:


```
python main.py --dataset_file _params/dataset_toy_triangle.yaml --model_file _params/model_vcause.yaml -d equations_type=linear
```


We can train our heterogeneous  VCAUSE algorithm on the German dataset:
```
python main.py --dataset_file _params/dataset_real_german.yaml --model_file _params/model_vcause.yaml  -m is_heterogeneous=1
```



To load a trained model:
 - set the training flag to `-i 0`.
 - select configuration file of our training model, i.e. `hparams_full.yaml`
```
python main.py --yaml_file=PATH/hparams_full.yaml -i 0
```


## Run Baselines

For training the MultiVAE baseline, for example on the SCM triangle with linear equations: 

```
python main.py --dataset_file _params/dataset_toy_triangle.yaml --model_file _params/model_multicvae.yaml -d equations_type=linear
```


For training the CAREFL baseline, for example on the SCM triangle with linear equations: 
```
python main.py --dataset_file _params/dataset_toy_triangle.yaml --model_file _params/model_carefl.yaml -d equations_type=linear
```


## Load a model and train/evaluate counterfactual fairness
Load your model and add the flag `--eval_fair`. For example:

```
python main.py --yaml_file=PATH/hparams_full.yaml -i 0 --eval_fair --show_results 0
```

For example:

```
python main.py --yaml_file=exper_test/german_800_100_100_std_None/vcause_piwae/dgnn_elbo_iwae_3_16_16_16_2_delta_normal_0.05_0.0_0.2_1/adam/0.005_0.9_0.999_1.2e-06_exp_lr_0.99/8/hparams_full.yaml -i 0 --eval_fair --show_results 0
```

## Choose different configurations

For training on a synthetic SCM, call
 - for linear (LIN) equations `-d equations_type=linear`,
 - for non-linear (NLIN) equations `-d equations_type=non-linear`, 
 - for non-additive (NADD) equation `-d equations_type=non-additive`. 

For training on different SCMs call 
- for triangle `--dataset_file _params/dataset_toy_triangle.yaml`, 
- for chain `--dataset_file _params/dataset_toy_chain.yaml`, 
- for collider `--dataset_file _params/dataset_toy_collider.yaml`, 
- for M-graph `--dataset_file _params/dataset_toy_mgraph.yaml`, 
- for loan `--dataset_file _params/dataset_toy_loan.yaml`, 
- for German dataset `--dataset_file _params/dataset_real_german.yaml`.

For plotting results use `--plots 1`.

For the sake of completeness, the output of `python main.py --help` is:


```
usage: main.py [-h] [--dataset_file DATASET_FILE] [--model_file MODEL_FILE]
               [--trainer_file TRAINER_FILE] [--yaml_file YAML_FILE]
               [-d KEY1=VAL1,KEY2=VAL2...] [-m KEY1=VAL1,KEY2=VAL2...]
               [-o KEY1=VAL1,KEY2=VAL2...] [-t KEY1=VAL1,KEY2=VAL2...]
               [-s SEED] [-r ROOT_DIR] [-i IS_TRAINING] [-f]
               [--show_results SHOW_RESULTS] [--cf_sample] [--plots PLOTS]

optional arguments:
  -h, --help            show this help message and exit
  --dataset_file DATASET_FILE
                        path to configuration file for the dataset
  --model_file MODEL_FILE
                        path to configuration file for the dataset
  --trainer_file TRAINER_FILE
                        path to configuration file for the training
  --yaml_file YAML_FILE
                        path to trained model configuration
  -d KEY1=VAL1,KEY2=VAL2..., --dataset_dict KEY1=VAL1,KEY2=VAL2...
                        manually define dataset configurations as string:
                        KEY1=VALUE1+KEY2=VALUE2+...
  -m KEY1=VAL1,KEY2=VAL2..., --model_dict KEY1=VAL1,KEY2=VAL2...
                        manually define model configurations as string:
                        KEY1=VALUE1+KEY2=VALUE2+...
  -o KEY1=VAL1,KEY2=VAL2..., --optim_dict KEY1=VAL1,KEY2=VAL2...
                        manually define optimizer configurations as string:
                        KEY1=VALUE1+KEY2=VALUE2+...
  -t KEY1=VAL1,KEY2=VAL2..., --trainer_dict KEY1=VAL1,KEY2=VAL2...
                        manually define trainer configurations as string:
                        KEY1=VALUE1+KEY2=VALUE2+...
  -s SEED, --seed SEED  set random seed, default: random
  -r ROOT_DIR, --root_dir ROOT_DIR
                        directory for storing results
  -i IS_TRAINING, --is_training IS_TRAINING
                        run with training (1) or without training (0)
  -f, --eval_fair       run code with counterfactual fairness experiment (only
                        for German dataset), default: False
  --show_results SHOW_RESULTS
                        run with evaluation (1) or without(0), default: 1
  --cf_sample           evaluate performance for on one counterfactual sample
  --plots PLOTS         run code with plotting (1) or without (0), default: 0
```


## TensorBoard visualization

You can track different metrics during (and after) training using TensorBoard. 
For example, if the root folder of the experiments is `exper_test`, we can run the following
command in a terminal

```
tensorboard --logdir exper_test/   
```

Then, we go to our favourite browser and go to `http://localhost:6006/` to visualize all the metrics. 


## Create the files to train the models for the experiments

```
python _create_experiment_files.py --dataset collider --model vcause --experiment_name all
python _create_experiment_files.py --dataset collider --model mcvae --experiment_name all
python _create_experiment_files.py --dataset collider --model carefl --experiment_name all
```

```
python _create_experiment_files.py --dataset mgraph --model vcause --experiment_name all
python _create_experiment_files.py --dataset mgraph --model mcvae --experiment_name all
python _create_experiment_files.py --dataset mgraph --model carefl --experiment_name all
```

```
python _create_experiment_files.py --dataset triangle --model vcause --experiment_name all
python _create_experiment_files.py --dataset triangle --model mcvae --experiment_name all
python _create_experiment_files.py --dataset triangle --model carefl --experiment_name all
```

```
python _create_experiment_files.py --dataset chain --model vcause --experiment_name all
python _create_experiment_files.py --dataset chain --model mcvae --experiment_name all
python _create_experiment_files.py --dataset chain --model carefl --experiment_name all
```

```
python _create_experiment_files.py --dataset loan --model vcause --experiment_name all
python _create_experiment_files.py --dataset loan --model mcvae --experiment_name all
python _create_experiment_files.py --dataset loan --model carefl --experiment_name all
```