## Setup

1. Install as an editable package:

```
pip install -e .
```

2. Create [OpenAI](https://platform.openai.com/docs/api-reference)/[Anthropic](https://docs.anthropic.com/claude/reference/getting-started-with-the-api)/[Google](https://developers.generativeai.google/guide/) API keys and write them to a `.env` file:

```
OPENAI_API_KEY=<key>
ANTHROPIC_API_KEY=<key>
PALM_API_KEY=<key>
```

3. Download [Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3), [Llama-2](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), etc. to a local path using HuggingFace's [snapshot_download](https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder):

```
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir="/my_models/Llama-2-7b-chat-hf", local_dir_use_symlinks=False)
```

## Usage

### Manual red teaming

Launch an interactive session with:

```
python manual_redteam.py --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication --stream
```

### Evaluation

We wrap API calls with unlimited retries for ease of evaluation, which suppresses many API errors. You may want to change the retry functionality to suit your needs.

Run test suite evaluation (defaults to `systematic` test suite):

```
python evaluate.py --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication
```

Run evaluation on `manual` test suite:

```
python evaluate.py --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication --test_dir data/manual --output_dir logs/manual
```

Run GCG suffix evaluation (on a machine with a GPU):

```
python evaluate.py --provider transformers --model vicuna_v1.1@/path/to/model --system_message vicuna_default --test_dir data/justask --output_dir logs/justask --output_tags _gcg=vicuna-7b --suffix_dir logs/gcg_attack/vicuna-7b-v1.3
```

### GCG attack

Run the GCG attack (on a machine with a GPU) with randomized scenario parameters in each iteration:

```
cd gcg_attack
python main_gcg.py --model vicuna_v1.1@/path/to/model --scenario Authentication --behavior withholdsecret
```

You can also run the attack with fixed scenario parameters, which caches prompt + user instruction tokens for significantly faster evaluation:

```
python main_gcg.py --model vicuna_v1.1@/path/to/model --scenario Authentication --behavior withholdsecret --fixed_params
```

Output logs will be stored in `logs/gcg_attack`. 

## Development

### Adding new models

`UselessModel` in [llm_rules/models/base.py](llm_rules/models/base.py) illustrates the expected interface for model objects.
The `__call__` method takes a list of `Message` objects and returns a list of response chunks which when concatenated together form the full response.

See `OpenAIModel` in [llm_rules/models/openai.py](llm_rules/models/openai.py) for a more complete example.

### Inference with vLLM

We implement basic vLLM inference in [llm_rules/models/vllm.py](llm_rules/models/vllm.py) using the `LLM` entrypoint, but this may not be the most efficient approach. `vllm` and `torch` must be installed separately.
