## Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

This is the official repository for our NeurIPS submission, "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense". This code is for reference only, all model/data links have been anonymized. We will open source them after acceptance.

### Running the paraphraser model (DIPPER)

**Requirements**

Since DIPPER is a 11B parameter model, please use a GPU with at least 40GB of memory to reproduce the experiments in the paper. Lower precision approximations or DeepSpeed optimizations may also be fine on lower memory GPUs, but we have not tested them in our experiments.

```
# required (for paraphrasing)
pip install torch transformers sklearn nltk
pip install --editable .

# optional (needed for some detection experiments)
pip install openai rankgen retriv sentencepiece
```

**Model Download**

*DIPPER manual download*

Checkpoint: [link](anonymous-link-will-be-released-after-acceptance)  
To run this downloaded model, in [`dipper_paraphrases/paraphrase_minimal.py`](dipper_paraphrases/paraphrase_minimal.py), uncomment the line `dp = DipperParaphraser(model="...")` and specify your model checkpoint path.

*SIM model*

You could optionally download the SIM model from Wieting et al. 2021 for calculating semantic similarity of the paraphrased outputs. Download the two files in [this link](anonymous-link-will-be-released-after-acceptance) and place them in [`dipper_paraphrases/sim`](dipper_paraphrases/sim).

**Verify DIPPER is working**

Please run the script [`dipper_paraphrases/paraphrase_minimal.py`](dipper_paraphrases/paraphrase_minimal.py) and compare the outputs with [`sample_outputs.md`](sample_outputs.md). The greedy decoded outputs should exactly match, while the top_p samples will have some differences from the sample outputs but have higher diversity.

**(IMPORTANT) paraphraser differences from paper**

There are two minor differences between the actual model and the paper's description:

1. Our model uses `<sent> ... </sent>` tags instead of `<p> ... </p>` tags.

2. The lexical and order diversity codes used by the actual model correspond to "similarity" rather than "diversity". For a diversity of X, please use the control code value `100 - X`. In other words, L60-O60 in the paper corresponds to `lex = 40, order = 40` as the control code input to the model.

This is all documented in our minimal sample script to run DIPPER, [`dipper_paraphrases/paraphrase_minimal.py`](dipper_paraphrases/paraphrase_minimal.py).

### Reproducing experiments in the paper

Dataset: Download the folders `open-generation-data` and `lfqa-data` from [this Google Drive link](anonymous-link-will-be-released-after-acceptance). Place them in your root folder. Reproducing the experiments in the paper has three steps. We have already done Step 1 and Step 2 and added preprocessed data to Google Drive link.

**Step 1: Generating text from large language models**

Use the scripts [`dipper_paraphrases/generate_gpt2.py`](dipper_paraphrases/generate_gpt3.py), or [`dipper_paraphrases/generate_gpt3.py`](dipper_paraphrases/generate_gpt3.py), or [`dipper_paraphrases/generate_opt.py`](dipper_paraphrases/generate_opt.py) as shown below,

```
# for no watermarking
python dipper_paraphrases/generate_gpt2.py --strength 0.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data

# for including watermarking
python dipper_paraphrases/generate_gpt2.py --strength 2.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data
```

You can speed this up by parallelizing it across multiple GPUs on SLURM using the code below. Please read the script before using parallelization, it will likely need modifications depending on your specific SLURM setup.

```
python dipper_paraphrases/parallel/schedule.py --command "python dipper_paraphrases/generate_gpt2.py --strength 0.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data" --partition gpu-preempt --num_shards 8

# after completion
python dipper_paraphrases/parallel/merge.py --input_pattern "lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl.shard_*"
```

**Step 2: Paraphrasing text generated by large language models**

Use the scripts [`dipper_paraphrases/paraphrase.py`](dipper_paraphrases/paraphrase.py) as shown below,

```
python dipper_paraphrases/paraphrase.py --output_file lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl --model dipper-paraphraser-xxl
```

You can also parallelize this in a manner identical to Stage 1.

**Step 3: Run AI-text detectors**

Use any of the scripts to run various detectors: [`dipper_paraphrases/detect_*.py`](dipper_paraphrases) as follows. Each script caches the processed data (such as API calls) and will run a lot quicker the next time. Note that the GPTZero and OpenAI experiments need access to API keys, see [`dipper_paraphrases/utils.py`](dipper_paraphrases/utils.py) for details.

```
python dipper_paraphrases/detect_watermark.py --output_file lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/watermark_cache.json
python dipper_paraphrases/detect_openai.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/openai_cache.json
python dipper_paraphrases/detect_gptzero.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/gptzero_cache.json
python dipper_paraphrases/detect_detectgpt.py --base_model "facebook/opt-13b" --output_file lfqa-data/opt_13b_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/detectgpt_cache_opt.json
python dipper_paraphrases/detect_rankgen.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/rankgen_cache.json
python dipper_paraphrases/detect_retrieval.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --retrieval_corpus pooled --technique bm25
```

We recommend reporting true positive rates at a false positive rate of 1% instead of ROC curves, as discussed in the paper. This will be printed by the script. Nevertheless, the full ROC curves will be stored in `roc_plots`, use [`dipper_paraphrases/plot_roc.py`](dipper_paraphrases/plot_roc.py) to plot them.

Since DetectGPT takes a while to run, it may be helpful to shard the DetectGPT experiments using the parallel scripts of the previous two steps. Use [`dipper_paraphrases/parallel/merge_json.py`](dipper_paraphrases/parallel/merge_json.py) to merge the cache. Set `--base_model none` to ignore loading the LLM and just rely on cached results. Also, don't forget the `--base_model` flag in DetectGPT runs, see the code for more details.

For the scaled retrieval experiments, please see [`dipper_paraphrases/detect_retrieval_scale_*.py`](dipper_paraphrases).
