# VAE Generalization Performance Analysis

This project is an experimental environment for evaluating VAE generalization performance using information-theoretic analysis derived generalization error bounds.

## Project Structure

```
vae_it_analysis/
├── README.md              # Project overview
├── requirements.txt       # Python dependencies
├── .gitignore            # Git ignore rules
├── config/               # Experiment configuration files
├── scripts/              # Execution scripts
├── utils/                # Common utilities
├── notebooks/            # Jupyter notebooks
├── data/                 # Datasets
├── model/                # Model implementations
└── results/              # Experiment results
```

## Setup

```bash
pip install -r requirements.txt
```

## Quick Start

- Single MNIST VAE run (uses K-fold LOO-style validation per YAML):
```bash
python scripts/train_vae_mnist.py --config config/mnist_vae_config.yaml
```
- Evaluate a finished experiment across all splits:
```bash
python scripts/evaluate_experiment.py \
  --experiment_dir results/experiments/mnist/vae_latent32_hidden512_256_128_64/train10000_beta0.1_lr0.0005
```

## Sweeping Training Sizes

Run the full pipeline for multiple `train_size` values (log-spaced or explicit):
```bash
# Log-spaced 7 points between 1k and 30k, train on CUDA, evaluate, then run MI on CPU
python scripts/sweep_train_size.py \
  --config config/mnist_vae_config.yaml \
  --device cuda \
  --num_points 7 --min_size 1000 --max_size 30000 \
  --run_mi --mi_device cpu --mi_max_splits 10 \
  --mi_max_samples_per_split -1 --mi_ef_max_train_samples -1 --mi_damping 1e-3
```
- Explicit sizes:
```bash
python scripts/sweep_train_size.py --config config/mnist_vae_config.yaml --device cuda \
  --sizes 1000 2000 5000 10000 20000 30000 --run_mi --mi_device cpu \
  --mi_max_splits 10 --mi_max_samples_per_split -1 --mi_ef_max_train_samples -1
```

### What the sweep does
1) For each `train_size`, it runs `train_vae_mnist.py` with that size (other hyperparameters from YAML).
2) It then runs `evaluate_experiment.py` to produce `evaluation_aggregated.json` under the experiment directory.
3) If `--run_mi` is set, it runs `estimate_mi_params_u.py` to compute the IF-based upper bound of I(parameters; U | X^n) and saves `mi_params_u.json` under the same directory.

## MI Upper-Bound Estimation (IF-based)

- Script: `scripts/estimate_mi_params_u.py`
- Purpose: Deterministic upper-bound approximation using influence-function and diagonal empirical Fisher.
- Key options:
  - `--device {cpu|cuda}`: computation device (for reproducibility, CPU recommended)
  - `--max_splits N`: limit number of splits processed (omit to use all splits found in `data_splits/`)
  - `--max_samples_per_split K`: number of training samples per split used in the bound; `-1` means use all training samples
  - `--ef_max_train_samples K`: samples used to estimate EF diagonal; `-1` means use all training samples
  - `--damping 1e-3`: diagonal damping for stability

Example (single experiment):
```bash
python scripts/estimate_mi_params_u.py \
  --experiment_dir results/experiments/mnist/vae_latent32_hidden512_256_128_64/train10000_beta0.1_lr0.0005 \
  --device cpu --max_splits 10 --max_samples_per_split -1 --ef_max_train_samples -1 --damping 1e-3
```

## Experiment Features

- Configurable training data size (`data.train_size` in YAML; overridable via CLI)
- K-fold LOO-style validation (`data.leave_one_out_ratio` × `validation.num_folds`)
- MLP-based VAE model
- Systematic experiment management and reproducible data splits
- Per-split checkpoints and aggregated metrics

## Results Layout

```
results/experiments/{dataset}/{model_name}/{experiment_id}/
├── data_splits/               # split_*.json + experiment_metadata.json
├── split_0/                   # per-split results and figures
├── split_1/
├── ...
├── aggregated_results.json    # training/eval summary
├── evaluation_aggregated.json # test metrics aggregated (from evaluate_experiment.py)
└── mi_params_u.json           # MI upper-bound results (when run)
```

## Reproducibility Notes

- Device selection: use `--device` (training) and `--mi_device` (MI). CPU is more deterministic.
- No additional randomness is introduced in MI; indices and sampling are deterministic (first N samples).
- Number of splits is read from `data_splits/experiment_metadata.json`.

## Git Management

This repository is configured to ignore:
- Large data files (`data/raw/`, `data/processed/`)
- Model checkpoints (`model/checkpoints/`, `*.pth`)
- Experiment results (`results/experiments/`)
- Logs and temporary files
- IDE and OS-specific files

What's tracked:
- Source code (`scripts/`, `utils/`, `model/`)
- Configuration files (`config/`)
- Documentation (`README.md`)
- Dependencies (`requirements.txt`)

## License

[Add your license information here] 