# Deep Kernel Relative Test for Machine-generated Text Detection 

## This is a test only example. Please refer to train_bkp branch for the full code

Official PyTorch implementation of the ICLR 2025 paper:

<!-- **Non-parametric Kernel Relative Test for Machine-generated Text Detectiony** -->
**[Non-parametric Kernel Relative Test for Machine-generated Text Detection](https://openreview.net/forum?id=z9j7wctoGV)**


Abstract: *Recent studies demonstrate that two-sample test can effectively detect machine-generated texts (MGTs) with excellent adaptation ability to texts generated by newer LLMs. However, the two-sample test-based detection relies on the assumption that human-written texts (HWTs) must follow the distribution of seen HWTs. As a result, it tends to make mistakes in identifying HWTs that deviate from the *seen HWT* distribution, limiting their use in sensitive areas like academic integrity verification. To address this issue, we propose to employ *non-parametric kernel relative test* to detect MGTs by testing whether it is statistically significant that the distribution of *a text to be tested* is closer to the distribution of HWTs than to the distribution of MGTs. We further develop a *kernel optimisation* algorithm in relative test to select the best kernel that can enhance the testing capability for MGT detection. As relative test does not assume that a text to be tested must belong exclusively to either MGTs or HWTs, it can largely *reduce the false positive error* compared to two-sample test, offering significant advantages in practical use. Extensive experiments demonstrate the superior detection performance of our method, compared to state-of-the-art non-parametric and parametric detectors.* 

**[Online Demo](https://huggingface.co/spaces/songyiliao/R-Detect)**

## Prepare the model and dataset (Optional)

The test script supports both remote and local models/datasets. You can use this script to download the model and dataset to local if you’re having trouble connecting to Hugging Face.

```bash
chmod +x ./download_model_and_dataset.sh
./download_model_and_dataset.sh
```

## Run the test

### 0. Install the dependencies

```bash
conda create -n rdetect python=3.12
conda activate rdetect
pip install -r requirements.txt
```

### 1. Generate the feature ref for test
```bash
python ./feature_ref_generater.py \
    --target MGT \ # The feature ref type, MGT or HWT, Required
    --sample_size 1000 \ # The sample size of generated feature ref. Default is 1000, must bigger than 100 and smaller than 30000
    --use_gpu # Use GPU or not.
    --local_model ./llm-models/roberta-base \ # Use local model or not, you need to download the model first, and set the path. Script will use remote if this param is empty.
    --local_dataset ./datasets/HC3 \ # The feature ref path of HWT. Script will use remote if this param is empty.

python ./feature_ref_generater.py \
    --target HWT \
    --sample_size 1000 \
    --use_gpu \
    --local_model ./llm-models/roberta-base \
    --local_dataset ./datasets/HC3 \
```
**You must generate the feature ref for both MGT and HWT**


## 2. Testing R-Detect with text file

```bash
python ./main.py \
    --test_file ./demo_text_gpt.txt \ # The file path of the test file. Default is demo_text_gpt.txt
    --use_gpu \ # Use GPU or not.
    --local_model \ # Use local model or not, you need to download the model first, and set the path. Script will use remote if this param is empty.
    --feature_ref_HWT ./feature_ref_HWT_1000.pt \ # The feature ref path of HWT. Required
    --feature_ref_MGT ./feature_ref_MGT_1000.pt \ # The feature ref path of MGT. Required
```


## 3. Run R-Detect Gradio GUI (Optional)
```bash
python ./feature_ref_generater.py --target HWT --sample_size 500
python ./feature_ref_generater.py --target MGT --sample_size 500
python app.py
```
**This is the same method used to run on Hugging Face.**

<!-- **TODO:**
```
1. Clean the files, code refactor, do we need to remove two sample tester?
2. Fix the warnnings
3. Test all args and functions
4. Requirements
``` -->

<!-- ## Requirements

- An NVIDIA RTX graphics card with 24 GB of memory.
- Python 3.8.19
- Pytorch 2.0.0

More details can be found in the `R-Detect.yml` file.

## Data and pre-trained models

For dataset, we mainly use HC3 while also supporting RAID, Beemo and DetectRL datasets. Only RAID requires manual downloading, the download link is [here](https://github.com/liamdugan/raid), please download the train/extra dataset as needed and put them into the `MGTBenchold/datasets` folder and rename them to `RAID_train.csv`/`RAID_extra.csv` respectively.
For the pre-trained language models, you need to first access them from the following links before running any experiments:

- gpt2-medium:  [download link](https://huggingface.co/openai-community/gpt2-medium/tree/main)
- gpt2-large:  [download link](https://huggingface.co/openai-community/gpt2-large/tree/main)
- t5-large:  [download link](https://huggingface.co/t5-large)
- t5-small:  [download link](https://huggingface.co/t5-small)
- roberta-base:  [download link](https://huggingface.co/FacebookAI/roberta-base/tree/main)
- roberta-base-openai-detector:  [download link](https://huggingface.co/roberta-base-openai-detector/tree/main)
- Hello-SimpleAI/chatgpt-detector-roberta : [download link](https://huggingface.co/Hello-SimpleAI/chatgpt-detector-roberta/tree/main)
- minhtoan/gpt3-small-finetune-cnndaily-news: [download link](https://huggingface.co/minhtoan/gpt3-small-finetune-cnndaily-news/tree/main)
- EleutherAI/gpt-neo-125m: [download link](https://huggingface.co/EleutherAI/gpt-neo-125m/tree/main)
- tiiuae/falcon-rw-1b: [download link](https://huggingface.co/tiiuae/falcon-rw-1b/tree/main)

Please use git clone to download their repos into the pretrained_models folder.

## Environment of R-Detect
You have to create a virtual environment and set up libraries needed for the project.
```
conda env create -f R-Detect.yml
```

## Run basic experiments


**Testing R-Detect with DetectRL dataset under zero-shot settings**

```
CUDA_VISIBLE_DEVICES=0 python run_meta_mmd_trans_combined.py --test_flag --id 10001 --sigma0 55 --lr 0.00005 --no_meta_flag --n_samples 3900 --target_senten_num 3000 --val_num 50 --sigma 30 --max_length 100 --trial_num 10 --num_hidden_layers 1 --target_datasets HC3 --text_generated_model_name chatGPT --base_model_name roberta-base-openai-detector --skip_baselines --mask_flag --transformer_flag --meta_test_flag --epochs 100 --two_sample_test --relative_test --print_details --relative_test_extra_n_samples -1 --test_dataset DetectRL --test_text_n_sample_rounds 10 --test_dataset_answer machine --relative_test_mode normal --relative_test_reference_mode random --test_text_n_sample_tokens 256 --relative_test_alpha 0.05 --test_dataset_answer_mix_ratio 0.5 --output_test_text_file --raid_split train --test_dataset_attack none --faster
```

**Testing R-Detect with RAID dataset under zero-shot settings**

```
CUDA_VISIBLE_DEVICES=0 python run_meta_mmd_trans_combined.py --test_flag --id 10001 --sigma0 55 --lr 0.00005 --no_meta_flag --n_samples 3900 --target_senten_num 3000 --val_num 50 --sigma 30 --max_length 100 --trial_num 10 --num_hidden_layers 1 --target_datasets HC3 --text_generated_model_name chatGPT --base_model_name roberta-base-openai-detector --skip_baselines --mask_flag --transformer_flag --meta_test_flag --epochs 100 --two_sample_test --relative_test --print_details --relative_test_extra_n_samples -1 --test_dataset RAID --test_text_n_sample_rounds 10 --test_dataset_answer machine --relative_test_mode normal --relative_test_reference_mode random --test_text_n_sample_tokens 256 --relative_test_alpha 0.05 --test_dataset_answer_mix_ratio 0.5 --output_test_text_file --raid_split train --test_dataset_attack none --faster
``` -->
