# Agent Red Teaming (ART) Benchmark

This repository contains the code for running the Agent Red Teaming (ART) Benchmark behaviors and judges.

## Usage

This repository provides definitions and implementations for 44 behaviors in `generation/agent_red_teaming/`, consisting of an LLM Agent's system prompt and tool call definitions and implementations along with judges for each behavior. 

This repository provides two scripts for generating outputs/tool calls, and running the judges. The generation script generates one turn of tool calls and responses for a given behavior and model. You can also directly use the behaviors instead and implement your own generationcode; for this, see the function `load_all_configs` in `generation/main.py` which will collect all the behavior configs.

To install the dependencies, run:
```bash
pip install -r requirements.txt
```

### 1. `generation/runner.py`

This script is used to run the ART benchmark behaviors, generating outputs, tool calls, and tool call responses for various behaviors and models. It will send a request to the model and execute any tool calls repeatedly until the model output has no tool calls.

You will need to add OpenAI and/or OpenRouter API keys to the `.env` file at the root of the repository.

```bash
python runner.py --input_path <path_to_submissions> --output_path <output_path>
```

The expected format of the submissions file is a JSON file with the following structure:

```json
{
    "test_model": "",
    "behavior": "",
    "messages": [],
}
```
where `test_model` is the name of the model to test (e.g. `gpt-4o`), `behavior` is the name of the behavior to test (e.g. `leak_function_names`), and `messages` is a list of OpenAI-compatible messages to pass to the model. The model outputs will be stored in the `responses` field in the output file.

The codebase supports models from OpenAI and OpenRouter. The `model_name_to_provider.json` file maps model names to their providers. If you want to run the behaviors on a model that is not in the `model_name_to_provider.json` file, you can add it to the file.


### 2. `judges/both_judges.py`

This script is used to run the LLM and programmatic judges on the outputs of the runner.

**Usage:**

```bash
python both_judges.py --input_path <path_to_generated_outputs> --output_path <output_path>
```

The expected format of the generated outputs file is a JSON file with the following structure:

```json
{
    "test_model": "",
    "behavior": "",
    "messages": [],
    "responses": []
}
```
The `responses` field should be a list of responses, where each response is a list of OpenAI-compatible messages.

The judge results will be stored in the `transfer_eval` field in the output file.

