
# ImpuGen: Unified Tabular Imputation and Generation via Task-Aligned Sampling Strategies

## 1. Setup — one‑line installer
```bash
python install.py --python 3.12 --cp cu126 --name impugen_aaai
```

| Step  | Action                                                                                     |
|-------|--------------------------------------------------------------------------------------------|
| **1** | Create a new conda env named **`impugen_aaai`** with Python **`3.12`**.                          |
| **2** | Install a CUDA‑matching **PyTorch ≥ 2.x** wheel (`cu118`, `cu126`, `cu128`, or `cpu`).     |
| **3** | Install all additional libraries from **`requirements.txt`** (Lightning, sdmetrics, etc.). |
| **4** | Automatically **download benchmark datasets** (UCI, Sklearn)                               |

> **Prerequisite** Conda (Miniconda / Mambaforge) must be installed 
> and the command should be run from the default `base` environment (i.e., no other conda env activated).

Activate the environment later with:
```bash
conda activate impugen_aaai
```

---

## 2. Run experiments

### 2.1 Imputation (MCAR 30 %)
```bash
CUDA_VISIBLE_DEVICES=<gpu_id> \
python run_experiments.py model=impugen scenario=mcar scenario.p=0.3
```
> **Models** `impugen` `diffputer` `knewimp` `simpdm` `remasker` `macode` `hyperimpute` `missforest` `em` `mice` `gain` 

### 2.2 Generation
```bash
CUDA_VISIBLE_DEVICES=<gpu_id> \
python run_experiments.py model=impugen
```
> **Models** `impugen` `tabnat` `tabdiff` `tabsyn` `tabddpm` `macode` `ctgan` `tvae`

### 2.3 Privacy metrics
```bash
CUDA_VISIBLE_DEVICES=<gpu_id> \
python run_experiments_privacy.py model=impugen
```
> **Models** `impugen` `tabnat` `tabdiff` `tabsyn` `tabddpm` `macode` `ctgan` `tvae`

## 3. Single run

```bash
CUDA_VISIBLE_DEVICES=<gpu_id> \
python run.py dataset=<dataset> seed=<seed> model=<impugen> scenario=<scenario>
```

_All logs, checkpoints, and evaluation reports are saved in **`Experiments/`**._
