# Analysis of the empirical distribution of the generated SCMs (Appendix G)

This repository contains code for analyzing the empirical distribution of randomly generated Structural Causal Models (SCMs) as described in Appendix G of the paper.

## Setup Development Environment

```bash
# Create and activate conda environment
conda create --name random-scm-analysis python=3.11
conda activate random-scm-analysis

# Install causal-profiler package
cd ../causal-profiler
pip install -e .

# Install additional dependencies
pip install dill tqdm pandas scipy
```

## Running the Analysis

The analysis consists of two main steps:

1. **Generate SCMs**: Use the `sample_and_analyze_scms.py` script to generate random SCMs with various parameter combinations and calculate metrics for each generated SCM.

   ```bash
   python sample_and_analyze_scms.py
   ```

2. **Analyze Results**: Use the Jupyter notebook `scm_analysis.ipynb` to analyze the generated SCMs and produce visualizations for the empirical distributions.

   **Option 1**: Open the notebook with VSCode's integrated Jupyter support:

   - Simply click on the notebook file in VSCode
   - VSCode will automatically render the notebook interface

   **Option 2**: Use Jupyter Notebook directly:

   ```bash
   # Install Jupyter if not already installed
   pip install notebook

   # Launch the notebook server
   jupyter notebook

   # Navigate to and open scm_analysis.ipynb in the browser interface
   ```

## Pre-sampled Data

Pre-sampled SCM data is provided in the following directory:
`data/2025_01_07_08_08_empirical_distrib`

## Project Structure

- `sample_and_analyze_scms.py`: Script to generate and analyze SCMs with different parameter combinations
- `utils_metrics.py`: Contains functions for calculating various metrics on SCMs
- `scm_analysis.ipynb`: Jupyter notebook for visualizing and analyzing the generated SCMs
- `data/`: Directory containing the pre-sampled SCM data
