# Code


This folder contains the code for our submission. Our code is based on the [MarkLLM](https://github.com/THU-BPM/MarkLLM) framework with some modifications. MarkLLM is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). We modified the code in the watermark folder to add our attack.


## Folder structure

- `config`: This folder contains the config files for different watermarking methods used in our submission.
- `watermark`: This folder contains the code for the watermarking methods. The code is from the original [MarkLLM](https://github.com/THU-BPM/MarkLLM) framework. We made the modifications to the code to add our attack.

## Our attack file
The implementation of our attack can be found in `attack_processor.py`. We note that, although the code takes the whole logits from the target model as input, it actually only uses the probability of the most likely next token and 20th likely next token to calculate the signal. This current implementation is just for the sake of simplicity and efficiency. 

## How to run
In order to run the code, you need to install the dependencies in `requirements.txt`.

```bash
pip install -r requirements.txt
```

The results for OPT model can be generated by running the following command:

```bash
bash run_opt.sh
```

The results for LLaMA model can be generated by running the following command:

```bash
bash run_llama.sh
```


## About GPT-3.5
In order to run the paraphrase attack based on GPT-3.5, you need to input your own API key in the `main.py` file.

## About XSIR, UPV, SIR
In order to run those baselines, additional models need to be downloaded. You can check the instructions in the [MarkLLM framework ](https://github.com/THU-BPM/MarkLLM/blob/main/MarkLLM_demo.ipynb).


## Modification to the MarkLLM framework
We modified the code in the `watermark` folder to add our attack. The change is that for each watermark file, we add a parameter `attack_processor` which is the `LogitsProcessor` to the function `generate_watermarked_text`. If the parameter is `None`, the attack processor will be added to the logits processor in the `generate_watermarked_text` function. 