# Reproducing Main Results

Follow the steps below to reproduce the main results from our paper.


## 1. Generate the Dataset

Run `data_generator.py` to generate the synthetic dataset.

- **Configuration**: Modify `config/data_gen.yaml` to change tunable parameters such as the number of incidents, variables, and states.
- Internally, this script calls `graph_gen.py`, which generates individual graph structures using `config/graph_gen.yaml`.


## 2. Learn the Prior

Before running RCG(CPDAG), we need to learn the true CPDAG:

- Use `learn_kess_g.py` with `k = -1` to learn the ground truth CPDAG.
- You may specify multiple values of `k` to generate different `k`-essential graphs.
- Use `learn_prior.yaml` for configuration.

**Important parameters in `learn_kess_g.py`:**

- `k`: Value for `k`-PC (`-1` for CPDAG and any positive value as the paramter `k` for `k`-PC).
- `ORACLE`: Whether to use d-separation (oracle) or data-based CI tests.


## 3. Run the Experiment

Run `compare_rcd.py` to execute RCG and other baselines:

- **Experiment types**:
  - Vary the number of nodes and evaluate top-$l$ accuracy (see Figure 4(a)).
  - Vary the number of anomalous samples and evaluate top-1 accuracy (see Figure 4(b)).

Use the `--exp` flag to switch between experiment types. Possible values are `1` and `2`.

- **Baselines**: To select or modify the set of baselines, edit the `BASELINES` list in `compare_rcd.py`.


## 4. Plot the Results

Visualize experiment results using `plot_exp.py`.

- Use the `--path` argument to provide the output directory generated by `compare_rcd.py`.
- Use the same `--exp` value as in the previous step to select the correct plot.


## Reproducing Results from the Paper

To facilitate reproducibility, we have included the dataset used in our paper. It is available in the `UAI-25-dataset` directory.

To reproduce the results shown in the paper, run the following commands:

```bash
# Reproduce results for Figure 4(a)
python3 compare_rcd.py --path UAI-25-dataset --exp 1
python3 plot_exp.py --path UAI-25-dataset/exp_results/{PATH} --exp 1

# Reproduce results for Figure 4(b)
python3 compare_rcd.py --path UAI-25-dataset --exp 2
python3 plot_exp.py --path UAI-25-dataset/exp_results/{PATH} --exp 2
```

Replace `{PATH}` with the specific subdirectory created during the experiment run.

The final plots used in the paper can be found in `UAI-25-dataset/exp_results/nodes` and `UAI-25-dataset/exp_results/int-samples`.

---

## Important Files and Folders

| File / Folder        | Description |
|----------------------|-------------|
| `graph_gen.py`       | Generates a single graph (incident) using `config/graph_gen.yaml`. |
| `data_generator.py`  | Generates the full dataset by calling `graph_gen.py` multiple times. Configured via `config/data_gen.yaml`. |
| `learn_kess_g.py`    | Learns the `k`-essential or CPDAG graph from an incident. |
| `para_kpc/`          | Parallel implementation of `k`-PC using multiprocessing. |
| `learn_prior.py`     | Learn the `k`-essential or CPDAG graph for a dataset. |
| `m_igs.py`           | Implementation of M-IGS. |
| `rcg.py`             | Implementation of `RCG`. |
| `compare_rcd.py`     | Runs multiple baselines on a given dataset and stores the result. |
| `plot_exp.py`        | Plots the results generated by `compare_rcd.py`. |

---

## Additional Notes

- **CI Tests**: We use chi-square (`chisq`) as the default CI test, which is suitable for discrete data.
- **Discretization**: For real-world datasets (e.g., Sock-shop), discretize continuous data using the `BINS` parameter.
- **Preprocessing**: Use the `PRE_PROCESS` flag to remove irrelevant columns like timestamps or constant-value features. Look for function `_select_cols` and `discretize` in `base_utils.py` for more information.
- **Parallelism**: Most scripts support parallel execution via `THREADING` and `WORKERS` parameters.

