# Spoofing fingerprints

## Requirements

To install requirements:

```setup
conda env create --file=env.yaml
conda activate spoofDetect
```
Depending on your GPU setup, this might be subject to change.

You also need to install watermark_stealing as a pip package.
The code for watermark_stealing was forked from https://github.com/eth-sri/watermark-stealing/tree/main, with all credits going to the respective authors.
For using a watermark stealing model, additional files need to be downloaded. Refer to watermark-stealing/README.md.


```
cd watermark-stealing
pip install -e .
```

Optionnaly you may use flash attention:
flash-attn (pip)
```
pip install flash-attn --no-build-isolation
```


## Reproducing the experiments

To reproduce the experiments from the paper, you need to setup a config file for your model. Then run

```
bash reprompting_pipeline.sh "path to your config" "Y if Learning, N if Stealing" "number of queries" "dataset (either c4 or dolly)" "Y if generating only spoofed text, N if generating both spoofed and xi-watermarked text"
```

This will generate the text for both Reprompting and Normal method.
To then get p-values run:

```
python generate_pvalues.py --cfg_path "path to your config" --reprompting "Y/N" --dataset "c4/dolly" --token_target "value for T"
```

The generated p-values will be in .csv files in the data/pvalues folder.

All the config files used for the experiments in the paper can be found in configs/generated.

Additionally, a configuration file generator is available in generate_configs.ipynb.
