# README

This folder contains the software and data for the submitted paper. These can be found in the 'src' directory. Furthermore, we provide full proofs of our results as well as additional experimental details in the pdf file titled "Appendix.pdf". 

# Requirements and Setup for Code

First, within the 'src' directory, run the following command:

```
pip install -r requirements.txt
```

Notably, this code requires Python >= 3.8 and Tensorflow version **2.3.0**. The code is currently not compatible with other Tensorflow versions. 

Then, after making sure **gcc** is installed, run the following command, which builds efficient graph samplers to be used downstream, in minimizing an empirical risk over the network. 

```
python setup.py build_ext --inplace
```


# Data

We use a pre-processed subset of data from the Slovakian social media website Pokec. The original data can be downloaded from https://snap.stanford.edu/data/soc-Pokec.html. Our processed data files can be found in the folder path *src/dat/pokec/regional_subset*. The Python module containing the initial data cleaning code is located at *src/relational_erm/data_cleaning/pokec.py*. 

# Reproducing the Experiments


The default settings for the code match those used in the paper. 

1. To reproduce the experimental results in Section 6.1 of the paper - estimating peer influence for continuous outcomes - from 'src' run the command:

```
python -m relational_erm.rerm_model.keras_model --beta_1 BETA_1 --covariate COVARIATE --seed SEED
```

where the parameter BETA_1 - representing the strength of unobserved confounding - can take any of the values 0, 1, 10, the parameter COVARIATE - representing the variable taken as the hidden source of confounding - can take any of the values 'region', 'registration', 'age', and the SEED can take any of the values from 1 to 100. 

The above script will produce a csv file with outcome predictions for each node under hypothetical interventional treatments T = all 0's and T = all 1's. 

Having obtained all csv files for all the possible parameter settings for BETA_1, COVARIATE, and SEED (all of these are already included in the directory named 'cluster_simulations'), the average peer effect point estimates and confidence bands can be obtained by running the following inside 'src':

```
python -m relational_erm.data_cleaning.Confidence_Intervals 
```


2. To reproduce the experimental results in Section 6.2 of the paper - estimating peer influence for binary outcomes - from 'src' run the command:


```
python -m relational_erm.rerm_model.keras_model2 --covariate COVARIATE --seed SEED
```

where, similarly to the continuous case, COVARIATE can be anything from 'region', 'registration', 'age', and the s=SEED can take any value from 1 to 100. This script also produces a csv file with outcome predictions for each node under hypothetical interventional treatments T = all 0's and T = all 1's. 

As above, the average peer contagion effects together with their error bands can be obtained by running 

```
python -m relational_erm.data_cleaning.Confidence_Intervals 
```

3. Finally, the way the treatment and outcome where simulated for the continuous and binary scenarios in this paper is shown in the Python module located at "src/relational_erm/data_cleaning/simulate_treatment_outcome.py". This file also computes the unadjusted, naive, peer contagion effects. The code corresponding to the parametric baseline method for peer contagion can be found at "src/relational_erm/data_cleaning/simulate_baseline_sbm.py". 

