# 5D Robustness Benchmark: Sample Size Efficiency

This repository evaluates the sample complexity of **Nested DRO** compared to other robust baselines (OR-WDRO, UOT-DRO, Standard DRO). The experiment is conducted on a 5-dimensional "Killer Dataset" to observe how Excess Risk behaves as the sample size increases ($N=500 \to 1000$).

## The "Killer" Dataset (5D)

The dataset is designed to confound estimators that rely solely on distance or density. It consists of 5-dimensional feature vectors with three distinct subgroups:

1.  **Normal Background (70%)**: $y = w^* x + b + \mathcal{N}(0, 1)$. Standard clean data.
2.  **Distant High Leverage (10%)**: Points located far from the center ($10\times$ distance) but with **correct** labels. These are informative and should be fitted.
3.  **Proximal Outliers (20%)**: Points located near the data center (high density region) but with **adversarial** labels (inverted weights $-w^*$ and large bias $+10$). These are designed to be indistinguishable from normal data by simple distance metrics.

## Algorithms

1.  **Nested DRO**: Uses variance regularization and dynamic epsilon to distinguish between useful high-leverage points and harmful proximal outliers.
2.  **OR-WDRO**: Outlier-Robust WDRO using trimmed loss and robust mean/variance estimation.
3.  **UOT-DRO**: Unbalanced Optimal Transport relaxation.
4.  **Standard DRO**: Baseline WDRO.

## File Structure

* **`synthetic_5d.py`** (Main Experiment):
    * Runs the benchmark for sample sizes $N \in [500, 600, 700, 800, 900, 1000]$.
    * Repeats experiments over 10 random seeds per sample size.
    * Generates the plot `excess_risk_5D_plot.pdf`.
* **`or_wdro_parameterSearching.py`**:
    * Grid search utility to find the optimal $\epsilon$ and `var_mult` (variance multiplier) for OR-WDRO.
* **`uot_dro_parameterSearching.py`**:
    * Grid search utility to find the optimal regularization parameters ($\lambda, \beta, \lambda_2$) for UOT-DRO.

## Requirements

* Python 3.x
* PyTorch
* NumPy
* Matplotlib
* tqdm

```bash
pip install torch numpy matplotlib tqdm