# Analysis of the empirical distribution of the generated SCMs (Appendix H)

This repository contains code for analyzing the empirical distribution of randomly generated Structural Causal Models (SCMs) as described in Appendix H of the paper.

## Setup Development Environment

```bash
# Create and activate conda environment
conda create --name random-scm-analysis python=3.11
conda activate random-scm-analysis

# Install causal-profiler package
cd ../causal-profiler
pip install -e .

# Install additional dependencies
pip install dill tqdm pandas scipy seaborn plotly scikit-learn bnlearn
```

## Running the SCM Distribution Analysis

The analysis consists of two main steps:

1. **Generate SCMs**: Use the `sample_and_analyze_scms.py` script to generate random SCMs with various parameter combinations and calculate metrics for each generated SCM.

   ```bash
   python sample_and_analyze_scms.py
   ```

2. **Analyze Results**: Use the Jupyter notebook `scm_analysis.ipynb` to analyze the generated SCMs and produce visualizations for the empirical distributions.

   **Option 1**: Open the notebook with VSCode's integrated Jupyter support:

   - Simply click on the notebook file in VSCode
   - VSCode will automatically render the notebook interface

   **Option 2**: Use Jupyter Notebook directly:

   ```bash
   # Install Jupyter if not already installed
   pip install notebook

   # Launch the notebook server
   jupyter notebook

   # Navigate to and open scm_analysis.ipynb in the browser interface
   ```

## Running the comparison with bnlearn and causalNF datasets

Open `scm_vs_bnlearn.ipynb` and `scm_vs_causalnf.ipynb`, then run all cells.

## Pre-sampled Data

Pre-sampled SCM data is provided in the following directories (if you prefer to run the notebooks directly without generating SCMs):
` data/2025_01_07_08_08_empirical_distrib`` and  `data/2025_07_09_19_03_empirical_distrib`

## Project Structure

- `sample_and_analyze_scms.py`: Script to generate and analyze SCMs with different parameter combinations
- `utils_metrics.py`: Contains functions for calculating various metrics on SCMs
- `scm_analysis.ipynb`: Jupyter notebook for visualizing and analyzing the generated SCMs
- `scm_vs_bnlearn.ipynb`: Jupyter notebook for comparison with bnlearn datasets
- `scm_vs_causalnf.ipynb`: Jupyter notebook for comparison with causalnf dataset
- `data/`: Directory containing the pre-sampled SCM data
