# Experiments

This folder contains all experiments for reproducing the paper results.

## Quick Start

```bash
# Quick test (~5 min)
./scripts/reproduce_all.sh --quick

# Full reproduction (~1 hour)
./scripts/reproduce_all.sh

# Synthetic only
./scripts/reproduce_all.sh synthetic

# Real-world only (requires data download first)
./scripts/reproduce_all.sh realworld
```

## Structure

```
experiments/
├── synthetic/                    # Section 6: Synthetic Experiments
│   ├── common.py                 # Shared data generation & infrastructure
│   ├── exp1_likelihood_benefit.py    # Figure 1: α sweep × scale ratio
│   ├── exp2_alpha_sensitivity.py     # Figure 2: α misspecification
│   └── exp3_contamination.py         # Figure 4: Outlier robustness
│
└── realworld/                    # Section 7: Real-World Experiments
    ├── run_evaluation.py         # 5-fold CV evaluation
    └── diagnose_dataset.py       # Dataset diagnostic tool
```

## Synthetic Experiments

### Experiment 1: When Does the Stable Likelihood Help? (Figure 1)

Tests the interaction between tail heaviness (α) and class heteroscedasticity (scale ratio).

```bash
cd experiments/synthetic
python exp1_likelihood_benefit.py --quick    # Fast test
python exp1_likelihood_benefit.py            # Full run
```

**Output:**
- `figures/fig1_likelihood_benefit.pdf` - 3-panel figure for paper
- `tables/table_d*.tex` - Appendix tables
- `tables/table3_tyler_threshold.tex` - Tyler threshold summary

**Key finding:** Robust estimators (smed+Tyler) win at heavy tails (α < 1.5), standard estimators (mean+LW) preserve discriminative scale at moderate tails.

---

### Experiment 2: Sensitivity to α Misspecification (Figure 2)

Tests how sensitive Stable-QDA is to using the wrong α value.

```bash
python exp2_alpha_sensitivity.py --quick
python exp2_alpha_sensitivity.py
```

**Output:**
- `figures/fig2_alpha_sensitivity.pdf` - Sensitivity curves
- `figures/fig3_fixed_vs_estimated.pdf` - Fixed α=1.5 vs estimated

**Key finding:** Fixed α=1.5 performs within 1% of oracle across all tail regimes.

---

### Experiment 3: Contamination Robustness (Figure 4)

Tests robustness to outlier contamination in training data.

```bash
python exp3_contamination.py --quick
python exp3_contamination.py
```

**Output:**
- `figures/fig4_contamination.pdf` - Degradation curves
- `tables/table_exp3_*.tex` - Accuracy under contamination

**Key finding:** Stable likelihood's polynomial decay naturally limits outlier influence.

---

## Real-World Experiments

### Prerequisites

Download datasets to `data/` folder (see `data/README.md` for instructions):
- HTRU2 (pulsar detection)
- Credit Card Fraud
- Ionosphere
- Weekly Stock Returns

### Running Evaluation

```bash
cd experiments/realworld

# Single dataset
python run_evaluation.py --dataset htru2 --data_dir ../../data/

# All datasets
python run_evaluation.py --all --data_dir ../../data/

# With subsampling (for large datasets)
python run_evaluation.py --dataset creditcard --subsample 50000
```

**Output:** JSON files with complete results including:
- Per-class α estimates
- Diagnostic recommendation
- 5-fold CV metrics (accuracy, PR-AUC, recall@precision95)
- Statistical significance (paired t-test vs Gaussian)

### Dataset Diagnostics

Run before classification to get estimator recommendation:

```bash
python diagnose_dataset.py --data ../../data/htru2.csv --target class --scale

# Or run demo with synthetic data
python diagnose_dataset.py --demo
```

---

## Output Directory

All results are saved to `results/`:

```
results/
├── synthetic/
│   ├── exp1_results.csv
│   ├── figures/
│   │   ├── fig1_likelihood_benefit.pdf
│   │   ├── fig2_alpha_sensitivity.pdf
│   │   └── fig4_contamination.pdf
│   └── tables/
│       ├── table3_tyler_threshold.tex
│       └── table_d*.tex
│
└── realworld/
    ├── results_htru2_*.json
    ├── results_creditcard_*.json
    └── ...
```

---

## Reproducing Paper Figures

| Figure | Script | Command |
|--------|--------|---------|
| Figure 1 | `exp1_likelihood_benefit.py` | `python exp1_likelihood_benefit.py` |
| Figure 2 | `exp2_alpha_sensitivity.py` | `python exp2_alpha_sensitivity.py` |
| Figure 4 | `exp3_contamination.py` | `python exp3_contamination.py` |

| Table | Script | Output File |
|-------|--------|-------------|
| Table 3 (Tyler threshold) | `exp1_likelihood_benefit.py` | `tables/table3_tyler_threshold.tex` |
| Table 4 (Real-world results) | `run_evaluation.py --all` | Aggregated from JSON outputs |
| Tables D.1-D.3 (Appendix) | `exp1_likelihood_benefit.py` | `tables/table_d*.tex` |

---

## Configuration

### Synthetic Experiments

Default settings (in each `exp*.py`):
```python
DEFAULT_CONFIG = {
    'n_per_class': 500,      # Samples per class
    'd': 10,                  # Dimensions
    'n_repeats': 20,          # Random repeats
    'base_seed': 42,          # Reproducibility
}
```

Quick mode reduces `n_repeats` to 5 and uses fewer α values.

### Real-World Experiments

```bash
python run_evaluation.py \
    --dataset htru2 \
    --data_dir ../../data/ \
    --output ../../results/realworld/ \
    --n_splits 5 \
    --alpha 1.5
```

---

## Runtime Estimates

| Experiment | Quick Mode | Full Run |
|------------|------------|----------|
| Exp 1 (Likelihood) | ~1 min | ~10 min |
| Exp 2 (Sensitivity) | ~1 min | ~5 min |
| Exp 3 (Contamination) | ~2 min | ~10 min |
| Real-world (all 4) | ~5 min | ~30 min |
| **Total** | **~10 min** | **~1 hour** |

---

## Troubleshooting

**Import errors:** Make sure you're running from the repository root or have `src/` in your Python path.

```bash
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
```

**Missing datasets:** See `data/README.md` for download instructions.

**Memory issues with Credit Card:** Use `--subsample 50000` to reduce dataset size.
