
# Replication Code for TabCascade

## Install

Our environment depends on `rpy2`, hence we require `R` to be installed on the system. Otherwise `uv sync` will fail.
Make sure `uv` is installed and run `uv sync`.
In addition, we need to install the R packages for the distributional tree encoder.
These are installed when running `uv run disttree/disttree.py`. For more info about disttree see the README in /disttree.
Note that for some systems custom steps may be needed dependent on your R installation and its interaction with `rpy2`.

## Running scripts

All configuration files are located in experiments/configs.
To replicate the experiments run from project root:
`uv run main.py DATA experiments/configs/MODEL/default.yaml train SEED -miss_mechanism=mnar --exp_path=MODEL_SEED`.
Substitute for DATA, MODEL, SEED as required. exp_path can be freely specified. Results will be saved in `results\DATA\exp_path`.

All config files are located in `experiments/config/MODEL`. For each model, the `default.yaml` config corresponds to the main results.

Use the `train` option to train the model and switch to `eval` to evaluate an already trained model.
For evaluation no GPU is needed.

Available models:
- arf
- ctgan
- tvae
- tabddpm
- tabsyn
- tabdiff
- cdtd
- tabcascade (default.yaml for DT encoder and default_gmm.yaml for GMM encoder)

Available datasets:
- adult
- airlines
- beijing
- credit_g
- default
- diabetes
- electricity
- kc1
- news
- nmes
- phoneme
- shoppers

Seeds used in the paper are 0, 42, 86.




