# Why do We Need New Benchmarks for Local Intrinsic Dimension Estimation

This repository contains the code for generation of the datasets used in our LID estimation benchmarks.

The benchmarks used in the paper (along with the used PCA output) can be also found [here](https://drive.google.com/file/d/1mGzGUVa37AjUREHx_vFoPzl1OCLjPJ1Q/view?usp=share_link) (password: `LocalIntrinsicDimensionBenchmarks`)

## Installation & running
If you have [`uv`](https://docs.astral.sh/uv/) installed, simply run:
```{python}
uv run generate_datasets.py
```
This will:
- create project-scoped virtual environement,
- download base datasets (FMNIST),
- generate synthetic datasets.

All artifacts will appear in `data/`.

## Generated artifacts

- `base_datasets` contains original [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset,
- `pca.joblib` is the output of PCA used in IDR experiments,
- `benchmarks` contains the benchmarks (separated to `train`, `val`, and `test`), which map to the experiments in the paper as follows.

| Folder | Experiment|
|--------|-----------|
| `e1_sampled_fmnist_step[1..13]` | Esitmated LID vs sample size|
| `e1_spiral_pca` | Nearby manifolds: `Spiral (IDR)`|
| `e2_arrows` | Real-like dataset with known LID: `Arrows dataset (BMS)`|
| `e2_uniform_pca` | Boundaries of manifolds: `Uniform (IDR)`|
| `e5_padded_fmnist_adddim0` | Real-world dataset transformations: `FMNIST with added dimensions (ADI)`|
| `e5_padded_fmnist_adddim4` | Real-world dataset transformations: `FMNIST with added dimensions (ADI)`|
| `e5_padded_fmnist_adddim8` | Real-world dataset transformations: `FMNIST with added dimensions (ADI)`|
| `e5_stretched_power0.25` | Real-world dataset transformations: `Stretched FMNIST dataset (ME)`|
| `e5_stretched_power4` | Real-world dataset transformations: `Stretched FMNIST dataset (ME)`|
| `e5_upscaled_fmnist` | Real-world dataset transformations: `Upscaled FMNIST (ASE)`|
| `e6_exp_pca` | Nearby manifolds: `Funnel (IDR)`|
| `e7_crescent_moon_radius3.0` | Thin manifolds: `Moon (IDR)`|
| `e8_gaussian4_pca` | Non-uniform densisties: `Gaussians (IDR)`|
| `e8_spaghetti_pca` | Manifold curvatures: `Spaghetti (IDR)`|
| `e8_sphere4_pca` | Manifold curvatures: `Sphere (IDR)` |

