# Create Red Teaming Data

1. Generate responses for `HuggingFaceH4/zephyr-7b-beta` using the prompt dataset `PKU-Alignment/PKU-SafeRLHF`. We use 9000 prompts in total. 
```
python generate.py
```

2. Generate safety annotation with `cais/HarmBench-Llama-2-13b-cls`
```
python safe_annotation.py
```

3. Generate helpfulness preference with `llm-blender/PairRM`
```
python helpful_annotation.py
```

4. Filter our the examples with harmful responses. 
```
python filter_red_teaming.py
```
