# README

## Mutate

To generate mutated sorry-bench dataset, run:
```
python mutate -filepath ../question.jsonl -method role_play -model gpt-4-0613
```
Our current mutation `--method` include:
```bash
# Linguistic Mutation
'question', 'slang', 'uncommon_dialects', 'technical_terms', 'role_play', 'misspellings',

# Persuasion Mutation
'Logical appeal', 'Authority endorsement', 'Misrepresentation', 'Evidence-based Persuasion', 'Expert Endorsement',

# Encoding and Cipher
"ascii", "caesar", "morse", "atbash",

# Translation
"translate-ml", "translate-ta", "translate-mr", "translate-zh-cn", "translate-fr",
```

See [mutate.py](mutate.py) for other details.

**Notice**: To run translation mutations, the code requires `googletrans` package, which is not compatible with `openai` package. Therefore, you need to create a new environment make sure `googletrans` is installed without `openai` via "pip install googletrans==3.1.0a0".

## Model Response Generation

To generate model response to the mutated dataset, first change directory to "FastChat/fastchat/llm_judge" directory, and simply run the model answer generation script with an add a `--data-mutation` option according to the mutation suffix (check [../question_{SUFFIX}.jsonl](../question_{SUFFIX}.jsonl) for the suffix), e.g.:

```bash
python gen_model_answer.py --bench-name sorry_bench --data-mutation question --model-path ckpts/vicuna-7b-v1.5 --model-id vicuna-7b-v1.5
python gen_model_answer_vllm.py --bench-name sorry_bench --data-mutation translate-zh-cn --model-path ckpts/vicuna-7b-v1.5 --model-id vicuna-7b-v1.5 # With `vllm` optimization (suggested)
```


## Decode (for cipher and translation)

**For cipher and translation mutations** ("ascii", "caesar", "morse", "atbash", "translate-ml", "translate-ta", "translate-mr", "translate-zh-cn", "translate-fr"), you need to **decode** the model responses before you evaluate them. You can do this by simply running:

```bash
python decode.py
```

The script will automatically decode any cipher-encoded model responses, and translated the multilingual-mutated model responses back to English. The decoded model responses will be saved at the same location to the original model responses, with a suffix of "_decoded" or "_translated_to_en".

**Notice:** This also requires an independent `google-translate` environment.

## Evaluation

Similarly, to evaluate the model responses to the mutated SORRY-Bench, simply run with an additional `--data-mutation` option according to the mutation suffix (check [../question_{SUFFIX}.jsonl](../question_{SUFFIX}.jsonl) for the suffix), e.g.:
```bash
python gen_judgment_safety.py --bench-name sorry_bench --data-mutation question --model-list vicuna-7b-v1.5
python gen_judgment_safety.py --bench-name sorry_bench --data-mutation translate-zh-cn --judge-model gpt-4-turbo-preview --model-list vicuna-7b-v1.5 # With `vllm` optimization (suggested)
```