# Watermark-LLM

This work is the implementation of the WAtermarking for Source Attribution (WASA) framework in the paper "Source Attribution for Large Language Model-Generated Data".

## Installation
1. Setup conda environment
```
$ conda create -n env_name python=3.11
$ conda activate env_name
```

2. Install the required packages in the conda environment:
```
$ conda install --file requirements.txt
```

3. Note that several required packages are not available in conda install.

   Install them using pip with the specified file:
```
$ pip install -r requirements_pip.txt
```

## Usage
The code to watermark the dataset is embed_watemark.py. To run it, download the dataset into data/ and modify the corredponding path in embed_watemark.py.

The main code is in run_lm_finetuning_final.py. To train a WASA-LLM, run the .sh scripts under the scripts folder with any necessary changes:
```
$ . scripts/run_watermark_finetune.sh 
```
For reproduction of the evaluation results, corresponding scripts are included under the folder as well. The implementations of the baseline original GPT are included in the baseline folder.
   
