# Stable-QDA: Robust Quadratic Discriminant Analysis for Heavy-Tailed Distributions

This repository contains the implementation and experiments for the paper:

> **Stable-QDA: Correcting Likelihood Misspecification in Quadratic Discriminant Analysis for Heavy-Tailed Data**

## Key Insight

Classical QDA assumes Gaussian class-conditional distributions, which leads to poor classification when data exhibits heavy tails. The core finding of this work is:

> **Correcting the likelihood specification (Gaussian → α-stable) often matters more than using robust parameter estimators.**

Stable-QDA replaces the Gaussian likelihood with an α-stable likelihood that decays polynomially rather than exponentially in Mahalanobis distance:

- **Gaussian**: log f(x) ∝ -D(x)/2  (exponential decay)
- **Stable**: log f(x) ∝ -((α+p)/2) × log(1 + D(x))  (polynomial decay)

## Installation

```bash
git clone https://github.com/your-username/stable-qda.git
cd stable-qda
pip install -r requirements.txt
```

## Quick Start

```python
from src import StableQDA, diagnose_dataset

# Load your data
X, y = load_your_data()

# Run diagnostics to get estimator recommendation
result = diagnose_dataset(X, y)
print(f"Recommendation: {result.recommendation}")

# Fit Stable-QDA
clf = StableQDA(alpha=1.5, estimator='standard')
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)
```

## Repository Structure

```
stable-qda/
├── src/                          # Main implementation
│   ├── stable_qda.py             # StableQDA classifier
│   ├── estimators.py             # Location and scatter estimators
│   ├── alpha_estimation.py       # Tail index estimation
│   ├── diagnostics.py            # Estimator selection
│   └── utils.py                  # Evaluation metrics
│
├── experiments/
│   ├── synthetic/                # Synthetic experiments (Section 6)
│   │   ├── exp1_likelihood_benefit.py    # Figure 1
│   │   ├── exp2_alpha_sensitivity.py     # Figure 2
│   │   └── exp3_contamination.py         # Figure 4
│   │
│   └── realworld/                # Real-world experiments (Section 7)
│       ├── run_evaluation.py     # Main evaluation script
│       └── diagnose_dataset.py   # Dataset diagnostic tool
│
├── scripts/
│   └── reproduce_all.sh          # One-command reproduction
│
├── data/
│   └── README.md                 # Dataset download instructions
│
└── results/                      # Generated results
```

## Reproducing Paper Results

### Full Reproduction (~1 hour)

```bash
./scripts/reproduce_all.sh
```

### Quick Test (~5 minutes)

```bash
./scripts/reproduce_all.sh --quick
```

### Specific Experiments

```bash
# Synthetic only
./scripts/reproduce_all.sh synthetic

# Real-world only (requires downloading datasets first)
./scripts/reproduce_all.sh realworld
```

## Estimator Selection Guidelines

Based on our experiments (Table 3 in paper):

| Determinant Ratio | Use Robust if α < |
|-------------------|-------------------|
| < 10              | 2.0 (always)      |
| 10-100            | 1.8               |
| 100-1000          | 1.7               |
| > 1000            | 1.6               |

**Practical recommendation**: Use the `diagnose_dataset()` function to automatically select estimators based on your data characteristics.

## Key Results

### Synthetic Experiments (Figure 1)

- **Heavy tails (α < 1.5)**: Robust estimators (spatial median + Tyler) improve accuracy by 15-25% over Gaussian QDA
- **Moderate tails + heteroscedasticity**: Standard estimators (mean + Ledoit-Wolf) preserve discriminative scale information
- **Light tails (α > 1.8)**: Gaussian QDA is sufficient

### Real-World Experiments (Table 4)

| Dataset | Gaussian QDA | Stable-QDA | Improvement |
|---------|-------------|------------|-------------|
| HTRU2 | 97.8% | 98.2% | +0.4% |
| Credit Card | 97.9% | 99.1% | +1.2% |
| Ionosphere | 87.5% | 90.8% | +3.3% |
| Weekly | 55.2% | 56.8% | +1.6% |

## API Reference

### StableQDA

```python
StableQDA(
    alpha=1.5,           # Stability index (1.0=Cauchy, 2.0=Gaussian)
    estimator='standard', # 'standard' (mean+LW) or 'robust' (smed+Tyler)
    reg_param=1e-6,      # Regularization for numerical stability
)
```

### Key Methods

- `fit(X, y)`: Fit the model
- `predict(X)`: Predict class labels
- `predict_proba(X)`: Predict class probabilities
- `score(X, y)`: Return accuracy

### Diagnostic Functions

```python
from src import diagnose_dataset, get_recommended_config

# Get detailed diagnostics
result = diagnose_dataset(X, y, verbose=True)

# Get recommended configuration
config = get_recommended_config(result)
clf = StableQDA(**config)
```

## Citation

```bibtex
@inproceedings{stable_qda_2026,
  title={Stable-QDA: Correcting Likelihood Misspecification in Quadratic Discriminant Analysis for Heavy-Tailed Data},
  author={Anonymous},
  booktitle={International Conference on Machine Learning},
  year={2026}
}
```

## License

MIT License
