Overview

This repository contains code for robust watermark detection for large language models (LLMs) under human edits. The core focus is on watermark detection and ensuring its robustness even in the presence of human-like modifications such as word deletions, substitutions, and insertions. The methods and models used for this work were designed to operate in streaming text scenarios, where watermarked content may appear intermittently.

Li's Code for Data Generation

The code in this repository is based on the work of Li, X., Ruan, F., Wang, H., Long, Q., and Su, W. J., titled "Robust detection of watermarks for large language models under human edits" (published in Journal of the Royal Statistical Society Series B: Statistical Methodology, 2025). The code provided in Li's codebase is used for generating data with different language models and watermark schemes, including:



1.OPT-1.3B

2.Llama-2.7B (Sheared version)

3.Qwen-2.5-3B



The watermark schemes tested include:



Gumbel-Max

Inverse Transform



These models and watermark schemes were run for 30 hours to generate the data, which was then saved as .pkl files in the raw\_data/ directory. The generated data is used for further analysis, and you can use the analysis scripts to process and visualize the results.

Available Analysis Scripts

Once you have generated the data, you can use the following Python scripts for analysis. These scripts will analyze the impact of different parameters and generate visualizations and statistical results based on the .pkl data stored in the raw\_data/ directory.



1.AttackAnalysis.py: Analyzes the effect of various attacks (such as human-like edits) on watermark detection performance.

2.HyperparameterAnalysis.py: Analyzes the impact of different hyperparameters (such as model type and watermark settings) on watermark detection performance.

3.SparsityAnalysis.py: Analyzes the effect of sparsity in the watermark signal on detection accuracy.

4.StreamAnalysis.py: Analyzes the performance of watermark detection in a streaming scenario, where text is processed in a sequential manner.



These scripts will generate figures and analysis data, which will be saved for further evaluation.

Run the analysis scripts. For example, to analyze the effect of attacks on watermark detection:



&nbsp;  python AttackAnalysis.py

