
# TAB-DRW

This is the implementation for TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data.

The backbone of TAB-DRW is based on [TabWak](https://github.com/chaoyitud/TabWak). Therefore, the installation and usage of TAB-DRW are similar to TabWak. The following installation steps are based on TabWak's instructions.

## Installing Dependencies

**Python version**: 3.10

### Step 1: Create Environment

```bash
conda create -n tabsyn python=3.10
conda activate tabsyn
```

### Step 2: Install PyTorch

Using `pip`:

```bash
pip install torch torchvision torchaudio
```

Or via `conda`:

```bash
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
```

### Step 3: Install Other Dependencies

```bash
pip install -r requirements.txt
```

### Step 4: Install Dependencies for GOGGLE

```bash
pip install dgl -f https://data.dgl.ai/wheels/cu117/repo.html
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html
```

### Step 5: Install Quality Metric Dependencies (synthcity)

Create another environment for the quality metric:

```bash
conda create -n synthcity python=3.10
conda activate synthcity

pip install synthcity
pip install category_encoders
```

## Preparing Datasets

### Using the Datasets from the Paper

Download the raw dataset:

```bash
python download_dataset.py
```

Process the dataset:

```bash
python process_dataset.py
```

## Training Models

For Tabsyn, use the following commands for training:

1. Train the VAE model first:

    ```bash
    python main.py --dataname [NAME_OF_DATASET] --method vae --mode train
    ```

2. After the VAE is trained, train the diffusion model:

    ```bash
    python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode train
    ```

## Watermark Embedding

You can first run the following to generate synthetic table (watmermark embedding is implemented during generation for both sample-phase and post-editing watermarking):

```bash
python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode sample --steps 1000 --with_w [Name_of_Watermark] --num_samples [Number_of_Rows]
```

**[Name_of_Watermark] options**: `TAB-DRW`, `GLW`, `tabmark`, `TabWak_op`, `muse`

Once you have done the sampling, you can directly perform post-editing watermarking on generated tables using the same instruction as above. You don't need to sample again for post-editing watermarking methods.

## Watermark Detection

For watermark detection, use:

```bash
python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode detect --steps 1000 --with_w [Name_of_Watermark] --num_samples [Number_of_Rows]
```

**[Name_of_Watermark] options**: `TAB-DRW`, `GLW`, `tabmark`, `TabWak_op`, `muse`


## Attacks on Watermarked Data

To run attacks on watermarked data, use:

```bash
python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode detect --steps 1000 --with_w [Name_of_Watermark] --num_samples [Number_of_Rows] --attack [Name_of_Attack_Options] --attack_percentage [0 to 1]
```

**[Name_of_Attack_Options]**: `rowdeletion`, `celldeletion`, `celldeletetion`, `noise`, `shuffle`


## Evaluation on Watermarked Data

To run evaluationg on watermarked data, use:

```bash
python eval/eval_density.py --dataname [NAME_OF_DATASET] --method tabsyn --path [Path_for_data_storage]
```

```bash
python eval/eval_detection.py --dataname [NAME_OF_DATASET] --method tabsyn --path [Path_for_data_storage]
```

```bash
python eval/eval_mle.py --dataname [NAME_OF_DATASET] --method tabsyn --path [Path_for_data_storage]
```

**[Path_for_data_storage_(example)]**: `synthetic/adult/TAB-DRW/-1/0/w-num-tabsyn.csv`
