
### Semantic Invariant Robust Watermark

#### Step1: Generate embeddings to train watermark model
```
python generate_embeddings.py --input_filename wiki-text-103.jsonl --output_filename cbert-embeddings-new.txt --model_path compositional-BERT
```

#### Step2: Train watermark model
```
python train_embedding_network.py --input_file cbert-embeddings-new.txt --output_model transform_model_cbert.pth --input_dim 1024
python generate_mapping.py --model_name llama-7b --size 50272 --dimension 300
```

#### Step3: Generate text using watermark model
```
python generate_watermarked_text.py --watermark_type context --output_path "llama_text/context/llama-100_window_sample.json" --base_model llama-7b --delta 1 --generate_number 200 --chunk_size 7 --decode_method sample --mapping_file "mapping/mapping_llama-7b.json"
```

#### Generate data

We list the generated data of our method as well as all the baseline methods used in this paper.

* Our method: generated_data/SIR(Ours)-text-with-dipper-L60.json
* KGW-1: generated_data/kGW1-text-with-dipper-L60.json
* KGW-2: generated_data/kGW2-text-with-dipper-L60.json
* KGW-4: generated_data/kGW4-text-with-dipper-L60.json
* EXP-Edit: generated_data/EXP-Edit-text-with-dipper-L60.json
