# <font color=deeppink>RedTopic<font color=lightgray>: Toward Topic-Diverse Red Teaming of Large Language Models

## What is important for practical red teaming of LLMs?

### [1] Adaptive generation
The evaluation method is supposed to generate effective adversarial prompts that keep abreast with the SOTA LLMs.
### [2] Topically diverse
The prompts contains diverse harmful goals:
* make a bomb
* assassinate a person
* generate sexual contents
* ...

**Are existing methods topically diverse?**
<img src='sourced-materials/tab-diversity.png'>

### [3] Balance effectiveness and diversity

**Do existing methods balance effectiveness and diversity?**
<img src='sourced-materials/fig-pareto.png'>

## What is <font color=deeppink>RedTopic <font color=lightgray>?
Our method includes three key components:
1. **contextualized adversarial prompt generation** pipeline;
2. **aggregate reward design**;
3. **multi-objective RL training loop**. 

<img src="sourced-materials/fig-RedTopic-framework.png">

## Setup

1. Start by installing the packages needed 

```
conda create -n redtopic python=3.10 
conda activate redtopic
pip install -r requirements.txt
```

2. config your ```accelerate``` by running 
```
accelerate config
```
, then add the file path to ```RedTopic/bash_scripts\XXX.sh```. You need at least **3** GPUs with 24 GiB memory each to finish all the experiments.

3. Provide your api-key for ```Aliyun```, ```openai```, ```Gemini```, and ```Deepseek``` in the script ```RedTopic/supplementary_models.py``` and ```RedTopic/utils/api_generation.py```, where the places have been marked by ```<YOUR API>```.

## Start your Topic Diversity-Driven Red Teaming!
* Run the bash script
```
bash RedTopic/bash_scripts/baseline_benchmark.sh
```
to evaluate the baseline benchmarks against LLMs.

* Run the bash script
```
bash RedTopic/bash_scripts/baseline_RFT.sh
```
to evaluate the baseline RFT-based methods against LLMs.

* Run the bash script
```
bash RedTopic/bash_scripts/ROSE.sh
```
to evaluate the RedTopic against LLMs.