# MET : Masked Encoding Tabular Data

## Requirements

To run experiments mentioned in the paper and install requirements use python version >=3.7:

```setup
pip install -r requirements.txt
```

## Standard Training (MET-S)

To train the MET-S model mentioned in the paper (model without adversarial training step) for FashionMNIST dataset, run this command:

```train
python3 train.py
```

To perform hyper parameter search, use [hyper_param_tune.sh](./hyper_param_tune.sh)
```
bash hyper_param_tune.sh
```
This writes the results to met.csv, from which we select the best set of hyper-parameters.

The following hyper-parameters are available for train.py :
+ **embed_dim** : Embedding dimension
+ **ff_dim** : Feed-Forward dimension
+ **num_heads** : Number of heads
+ **model_depth_enc** : Depth of Encoder/ Number of transformers in Encoder stack
+ **model_depth_dec** : Depth of Decoder/ Number of transformers in Decoder stack
+ **mask_pct** : Masking Percentage
+ **lr** : Learning rate

Each of the above can be changed by adding --flag_name=flag_value to train.py. For example :
```
python3 train.py --model_depth_enc=1
```

The model is saved [here](./saved_models/) by default

## Adversarial Training (MET)

To train the MET model in the paper for FashionMNIST dataset trained using Adversarial training, run this command:
```train
python3 train_adv.py
```
To perform hyper parameter search, use [hyper_param_tune.sh](./hyper_param_tune.sh) by replacing train.py with train_adv.py.
```
bash hyper_param_tune.sh
```
This writes the results to met.csv, from which we select the best set of hyper-parameters.

The following hyper-parameters are available for train.py :
+ **embed_dim** : Embedding dimension
+ **ff_dim** : Feed-Forward dimension
+ **num_heads** : Number of heads
+ **model_depth_enc** : Depth of Encoder/ Number of transformers in Encoder stack
+ **model_depth_dec** : Depth of Decoder/ Number of transformers in Decoder stack
+ **mask_pct** : Masking Percentage
+ **lr** : Learning rate
+ **radius** : Radius of L2 norm ball around the input data point
+ **adv_steps** : Adversarial loop length
+ **lr_adv** : Adversarial Learning Rate

Each of the above can be changed by adding --flag_name=flag_value to train.py. For example :
```
python3 train_adv.py --radius=14
```

The model is saved [here](./saved_models/) by default

## Adding a new dataset :

You can try using the model on any new dataset by creating a csv file. The first column of the csv file should be class followed by the attributes. Sample csv files are available in [data](./data/)

To pass on the csv file to any of the training and evaluation scripts use the following flags :
+ **num_classes** : Number of classes
+ **model_kw** : Keyword for model (Eg fmnist for fashion-mnist)
+ **train_data_path** : Path to train csv file
+ **val_data_path** : Path to validation csv file
+ **test_data_path** : Path to test csv files

- By default models are stored in [saved_models](./saved_models/). You can change the training path using flag **model_path**.
- Synthetic dataset can be created using [get_2d_dataset.py](./data/get_2d_data.py). By default a created dataset is available in [data](./data/2d_train.csv)

## Pre-trained Models

Pretrained models for FashionMNIST for optimal adversarial training setting is available in [saved_models](./saved_models/). You can extract the models using command:
```7z
7z e fmnist_saved.7z.001
```
```
7z e fmnist_saved_adv.7z.001
```

## Evaluation

To evaluate the saved MET-S model run
```eval
python3 eval.py --model_path="./saved_models/fmnist_64_1_64_6_1_70_1e-05" --model_path_linear="./saved_models/fmnist_linear_64_1_64_6_1_70_1e-05"
```

To evaluate the saved MET model run
```
python3 eval.py --model_path="./saved_models/fmnist_adv_64_1_64_6_1_70_1e-05" --model_path_linear="./saved_models/fmnist_linear_adv_64_1_64_6_1_70_1e-05"
```

By default results are written to **met.csv**.