# Replicating the paper's results

We divide the reproducibility of the experiments into two parts: the generation of synthetic data and the evaluation of the generated data. The following sections describe how to reproduce the experiments for each part.
> To reproduce some of the figures the synthetic data needs to be downloaded first. The tables can be reproduced with the results provided in the repository or by re-running the benchmark.

First, create a .env file in the root of the project with the path to the root of the project. Copy `.env.example`, rename it to `.env` and update the path.

## Download synthetic data and results

The data and results can be downloaded and extracted with the below script, or are available on [google drive here](https://drive.google.com/drive/folders/1L9KarR20JqzU0p8b3G_KU--h2b8sz6ky).

```bash
conda activate reproduce_benchmark
./experiments/reproducibility/download_data_and_results.sh
```

## Evaluation of synthetic data
To run the benchmark and get the results of the metrics, run:

```bash
conda activate reproduce_benchmark
./experiments/reproducibility/evaluate_relational.sh

./experiments/reproducibility/evaluate_tabular.sh

./experiments/reproducibility/evaluate_utility.sh
```

## Generation of synthetic data
Depending on the synthetic data generation method a separate pythone environment is needed. The instruction for installing the required environment for each method is provided in [docs/INSTALLATION.md](/docs/INSTALLATION.md).

After installing the required environment, the synthetic data can be generated by running the following commands:

```bash
conda activate reproduce_benchmark
./experiments/reproducibility/generation/generate_sdv.sh

conda activate rctgan
./experiments/reproducibility/generation/generate_rctgan.sh

conda activate realtabformer
./experiments/reproducibility/generation/generate_realtabformer.sh

conda activate tabular
./experiments/reproducibility/generation/generate_tabular.sh

conda activate gretel
# The method requires a separate connection-uid for each dataset see the README for more information
python experiments/generation/gretel/generate_gretel.py --connection-uid  <connection-uid> --model lstm
python experiments/generation/gretel/generate_gretel.py --connection-uid  <connection-uid> --model actgan

conda activate mostlyai
./experiments/reproducibility/generation/generate_mostlyai.sh <api-key>

cd experiments/generation/clavaddpm
./generate_clavaddpm.sh <dataset-name> <real-data-path> <synthetic-data-path>
```

To generate data with MOSTLYAI, insructions are provided in [experiments/generation/mostlyai/README.md](experiments/generation/mostlyai/README.md). <br>
Further instructions for GRETELAI are provided in [experiments/generation/gretel/README.md](experiments/generation/gretel/README.md).

## Visualising Results
To visualize results, after running the benchmark you can run the below script. The figures will be saved to `results/figures/`:
```bash
conda activate reproduce_benchmark
./experiments/reproducibility/generate_figures.sh
```
## Reproducing Tables
To reproduce the tables you can run the below script. The tables will be saved as .tex files in `results/tables/`:
```bash
conda activate reproduce_benchmark
./experiments/reproducibility/generate_tables.sh
```
