##################################################################################################
# This text describes the official implementation of 				    		 #
# ANOCE: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning  #
##################################################################################################

Also see our anonymous repository at `https://github.com/anoce-cvae/ANOCE-CVAE`.

######################################################################################

I. Requirements

 - Python 3.7
 - `PyTorch` >1.0
 - `numpy`
 - `pandas`
 - `scipy`
 - `networkx`
 - `multiprocessing`
 - `argparse`
 - `pickle`
 - `os`

######################################################################################

II. Contents

  1. `README.txt`: implementation details of source code

  2. `train.py`: main training code of ANOCE-CVAE

  3. `utils.py`: utility functions of ANOCE-CVAE, including graph simulation, data simulation, VAE utility functions, analysis of causal effects, and accuracy evaluation

  4. Real Dataset for the COVID19 Spread:
    a). Dataset:
	- `covid19.pkl` and `covid19.csv`: the real dataset for the COVDI-19 outbreak, where the first column is the exposure (i.e. 2020 Hubei lockdowns), the last column presents the outcome (i.e. the increase rate of confirmed cases out of Hubei), and the middle 30 columns to 30 mediators (i.e. the selected major cities) ordered by cities' total migration scale during the data period (from Jan 12th to Feb 20th, 2020). Here, the information of cities' migration scale is scraped from Baidu Qianxi (https://qianxi.baidu.com/, credit to the code from https://github.com/samelltiger/baidu_qx).
    c). Realdata_COVID19_Summary.ipynb`: Summary code for tables and figures, written in Jupyter Notebook. 
    d). Figures:
	- the interactive graphical results on the real data analysis, stored in HTML format;
	- the snapshot figures of interactive graphical results, stored in PNG format;
	- other descriptive figures for real data results, stored in PDF format.

  5. Random Graphs for Simulation Studies
    a). True_Graphs_Section5.1: the true graphs for four scenarios in the simulation studies (Section 5.1).
	- `S1_trueG.pkl`: Scenario 1
	- `S2_trueG.pkl`: Scenario 2
	- `S3_trueG.pkl`: Scenario 3
	- `S4_trueG.pkl`: Scenario 4
    b). True_Graphs_Section5.2: the true graphs for six cases in the comparison studies (Section 5.2).
	- `s_ER1_trueG.pkl`: Case ER1
	- `s_ER2_trueG.pkl`: Case ER2
	- `s_ER4_trueG.pkl`: Case ER4
	- `s_SF1_trueG.pkl`: Case SF1
	- `s_SF2_trueG.pkl`: Case SF2
	- `s_SF4_trueG.pkl`: Case SF4

######################################################################################

III. Training (Simulation Studies)

To train the model(s) in the simulation studies, run this command:

```train
python train.py --data_type='simulation' --simu_G_file=<CHOICE1> --A_type=<CHOICE2> --node_number=<CHOICE3> --sample_size=<CHOICE4> --batch_size=<CHOICE5>
```
CHOICE1 = 'S1_trueG.pkl', 'S2_trueG.pkl', 'S3_trueG.pkl', 'S4_trueG.pkl', corresponding to four scenarios in Section 5.1, respectively; or 's_ER1_trueG.pkl' to 's_SF4_trueG.pkl' corresponding to six cases in Section 5.2;
CHOICE2 = 'Gaussian', 'Binary', corresponding to two types of exposure;
CHOICE3 = 12, 32, according to the number of nodes in the selected scenario;
CHOICE4 = 50, 500, according to the sample size in the selected setting;
CHOICE5 = 25, 100, according to the sample size in the selected setting;

Example: 

```For Scenario 1 with Binary exposure and sample size as 500:
python train.py --data_type='simulation' --simu_G_file='S1_trueG.pkl' --A_type='Binary' --node_number=12 --sample_size=500 --batch_size=100
```

######################################################################################

IV. Evaluation (Real Data of COVID-19)

To evaluate my model on the outbreak of COVID-19 to investigate the effect of 2020 Hubei lockdowns with 100 replication, run: 

```eval
python train.py --data_type='realdata' --real_data_file='covid19.pkl' --node_number=32 --sample_size=38 --batch_size=19 --rep_number=100
```

See more details on the collection and meaning of the COVID-19 Dataset in our main text.  


######################################################################################

V. Acknowledgments

Our work and code benefit from existing works, which we are very grateful.

* DAG-GNN https://github.com/fishmoon1234/DAG-GNN
* DAG NOTEAR https://github.com/xunzheng/notears
