# TabM<!-- omit in toc -->

# How to reproduce the results

## Set up the environment

### Software

**Step 1.**

Download project.

**Step 2.** Set up [Micromamba](https://mamba.readthedocs.io/en/latest/installation.html#manual-installation) or [Mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) or [Conda](https://docs.conda.io/projects/miniconda/en/latest/) (Micromamba or Mamba are recommended).

**Step 3.**

If you use Micromamba:

```shell
micromamba create -f environment.yaml
micromamba activate tabm
```

If you use Mamba:

```shell
mamba create -f environment.yaml
mamba activate tabm
```

If you use Conda (with Conda, the first command can take *extremely* long time to the point that it will be practically impossible to wait for it to complete):

```shell
conda create -f environment.yaml
conda activate tabm
```

### Data

(***License:** we do not impose any new license restrictions in addition to the original licenses of the used dataset.
See the paper to learn about the dataset sources*)

Navigate to the repository root and run the following commands:
```
wget https://huggingface.co/datasets/puhsu/tabular-benchmarks/resolve/main/data.tar -O tabular-dl-tabr.tar.gz
tar -xvf tabular-dl-tabr.tar.gz
```

After that, the `data/` directory should appear.

### Environment variables

**When running scripts, the environment variable `CUDA_VISIBLE_DEVICES` must be explicitly set**. So we assume that you do run the following command first before running other commands:

```
export CUDA_VISIBLE_DEVICES="0"
```

## Main files description

- `bin/model.py` -- performs one training run of TabM or TabM-mini or TabM-naive or MLP (optionally, with numerical embeddings)
- `bin/model_analysis.py` -- performs analysis of training dynamics (Subsection 5.2)
- `bin/tune.py` -- performs hyperparameter tuning
- `bin/evaluate.py` -- performs evaluation over multiple seeds
- `bin/ensemble.py` -- performs ensembling of multiple seeds
- `bin/go.py` -- runs the three scripts above: `tune` + `evaluate` + `ensemble`
- `exp/` -- hyperparameter configurations and results
- `lib/` -- common functions and utilities (used by the scripts in `bin`)

## Usage example

Create a directory for reproducing the results (instead of `tabm` and `california`, any other model and dataset can be used):

```
mkdir -p exp/reproduce/tabm/california
```

To simply train a model once without hyperparameter tuning, use `bin/model.py` directly:

```
# Copy the existing hyperparameters config.
cp exp/tabm/california/0-evaluation/0.toml exp/reproduce/tabm/california/0-evaluation/0.toml

# Run the training.
python bin/model.py exp/reproduce/tabm/california/0-evaluation/0.toml
```

To run the whole pipeline with hyperparameter tuning:

```
# Copy the existing hyperparameter tuning config.
cp exp/tabm/california/0-tuning.toml exp/reproduce/tabm/california/0-tuning.toml

# Run the pipeline.
python bin/go.py exp/reproduce/tabm/california/0-tuning.toml
```
