# Reconstruction / Privacy Experiments

This directory contains code for the privacy reconstruction experiments presented in **Section 6.2, Section 6.3, and Appendix D.1** of the paper. These experiments evaluate data extraction attacks against models in Rashomon sets.

## Overview

The experiments assess privacy vulnerabilities of interpretable models by attempting to reconstruct training data from model parameters. We compare single models against Rashomon sets to understand how model diversity affects privacy guarantees.

## Directory Structure

```
reconstruction/
├── data/                                      # Dataset files
│   ├── adult.csv
│   ├── compas.csv
│   ├── fico.csv
│   ├── default_credit.csv
│   ├── diabetes.csv
│   └── bank-marketing.csv
├── verification/                              # Attack implementations
│   ├── adversary.py                          # Adversary model
│   ├── attack.py                             # Base attack class
│   └── decision_tree_attack.py               # Tree-specific attacks
├── data_extraction_from_rf_experiments.py    # Main experiment script
├── datasets_infos.py                         # Dataset configurations
├── DRAFT.py                                  # DRAFT attack implementation
├── plot_results.py                           # Result visualization
├── rf_wrapper.py                             # Random forest wrapper
├── rset_wrapper.py                           # Rashomon set wrapper
├── tree_classifier.py                        # Tree classifier utilities
└── utils.py                                  # Helper functions
```

## Running the Experiments

### Step 1: Run Data Extraction Experiments

Execute the reconstruction attack experiments with a specific configuration:

```bash
python data_extraction_from_rf_experiments.py --expe_id <config_id>
```

**Configuration IDs:**
- Each `expe_id` corresponds to a unique experimental configuration
- Different IDs test various combinations of:
  - Datasets (Adult, COMPAS, FICO, etc.)
  - Model types (single trees, random forests, Rashomon sets)
  - Attack strategies
  - Privacy parameters

For parallel execution across multiple configurations:

```bash
# Run multiple experiments in parallel
for id in {0..10}; do
    python data_extraction_from_rf_experiments.py --expe_id $id &
done
wait
```

### Step 2: Visualize Results

Generate plots from the experimental results:

```bash
python plot_results.py
```

This creates visualizations showing:
- Reconstruction success rates
- Privacy loss metrics
- Comparison between single models and Rashomon sets
- Trade-off curves

## Datasets

The experiments use six real-world datasets:

| Dataset | Description | Size | Features |
|---------|-------------|------|----------|
| **Adult** | Income prediction | ~48K | 14 |
| **COMPAS** | Recidivism prediction | ~7K | 13 |
| **FICO** | Credit scoring | ~10K | 23 |
| **Default Credit** | Default prediction | ~30K | 23 |
| **Diabetes** | Diabetes prediction | ~768 | 8 |
| **Bank Marketing** | Marketing response | ~45K | 16 |

All datasets are preprocessed and stored in the `data/` directory.

## Experimental Parameters

Key parameters that can be configured in the scripts:

- **Rashomon set epsilon** (`eps`): Controls the size of the Rashomon set
- **Attack budget**: Resources available to the adversary
- **Model complexity**: Maximum depth/size of decision trees
- **Privacy budget**: Differential privacy parameters (if applicable)

## Attack Methods

The experiments implement several data extraction attacks:

1. **DRAFT**: Data Reconstruction from Aggregated Function Trees
2. **Direct Extraction**: Extracting data points that satisfy tree constraints
3. **Optimization-based**: Using constraint solvers to find valid data points

See the `verification/` directory for attack implementations.

## Expected Output

Running the experiments generates:
- JSON files with detailed results per configuration
- CSV files with aggregated statistics
- Plot files (PNG/PDF) for paper figures
- Console logs showing progress and key metrics

## Notes

- **Attribution**: This code is adapted from the [DRAFT repository](https://github.com/vidalt/DRAFT) with significant modifications for Rashomon set analysis.
- **Computation**: Some experiments may be computationally intensive. Consider using parallel execution or SLURM for large-scale runs.
- **Reproducibility**: Set random seeds in the scripts for reproducible results.

## Troubleshooting

**Issue**: Import errors for verification modules
- **Solution**: Ensure the `verification/` directory has an `__init__.py` file

**Issue**: Missing dataset files
- **Solution**: Verify all CSV files are present in the `data/` directory

**Issue**: Gurobi license errors
- **Solution**: Ensure you have a valid Gurobi license if using optimization-based attacks

## Citation

If you use the DRAFT attack implementation, please also cite:

```bibtex
@inproceedings{vidal2022draft,
  title={DRAFT: Data Reconstruction from Aggregated Function Trees},
  author={Vidal, Thibaut and others},
  year={2022}
}
```
