# 🔍 Codebase Structure 🔍
The HarmBench code is structured around five key elements:
1. Evaluation pipeline scripts
2. Experiments and target models (conceptual framework)
3. The `baselines` folder
4. The `configs` folder
5. The `data` folder

## 1: Evaluation pipeline scripts
The evaluation pipeline consists of four scripts: `generate_test_cases.py`, `merge_test_cases.py`, `generate_completions.py`, `evaluate_completions.py`. We describe how to use these in the Quick Start section. At a high level, the first two scripts output a `test_cases.json` dictionary for a specific **experiment**. This dictionary contains a list of test cases for each behavior. The second two scripts run the test cases on a **target model** to compute attack success rate (ASR).

## 2: Experiments and target models
Each `test_cases.json` file generated by a red teaming method corresponds to a single **experiment**. Each red teaming method can run different experiments, corresponding to test cases optimized for different target models, or more generally test cases generated with different hyperparameters. By convention we save test cases in `{base_results_dir}/{method_name}/{experiment_name}/test_cases/test_cases.json`, so some example save locations could be
```
results/GCG/llama2_7b/test_cases/test_cases.json
results/GCG/baichuan2_7b/test_cases/test_cases.json
results/GCG/baichuan2_7b_1000steps/test_cases/test_cases.json
```
The experiment names (llama2_7b, baichuan2_7b, and baichuan2_7b_1000steps) are specified in each method's config file (described below).

After we have a `test_cases.json` file for a specific experiment, we can generate and evaluate completions with the second half of the evaluation pipeline. The models config file (described below) defines a list of **target models**. The target model names determine where completions and results are saved. By convention we save completions in `{base_results_dir}/{method_name}/{experiment_name}/completions/{model_name}.json` and classification results in `{base_results_dir}/{method_name}/{experiment_name}/results/{model_name}.json`. For example, the results of the baichuan2_7b_1000steps experiment from above might be saved in
```
results/GCG/baichuan2_7b_1000steps/completions/baichuan2_7b.json
results/GCG/baichuan2_7b_1000steps/results/baichuan2_7b.json
```
Note that any target model can be used, enabling easy experiments with transfer attacks.

## 3: The `baselines` folder
The baselines folder contains code for each red teaming method. Each subfolder contains code for one or more red teaming methods. Each red teaming method is a subclass of `RedTeamingMethod` defined in `baselines/baseline.py`. The available red teaming methods are listed in `baselines/__init__.py`.

(Note: There is an unfortunate clash of terminology between "red teaming method" and "Python method". In most cases, we use "method" to refer to "red teaming method", each of which is defined by a class in `baselines/__init__.py`)

## 4: The `configs` folder
The configs folder contains configs for each red teaming method in `configs/method_configs/{method_name}.yaml`, a single `configs/model_configs/models.yaml` config file for all models, and a `configs/pipeline_configs/run_pipeline.yaml` file for use with `scripts/run_pipeline.py`. All scripts can point to different config locations, but these are the defaults.

The method config files each contain a `default_method_hyperparameters` entry that specifies the default inputs to the red teaming method's init function. All other entries in a method config file are treated as experiments, which can contain additional hyperparameters. When a specific experiment name is specified, the corresponding experiment hyperparameters update the default hyperparameters before the red teaming method's class is initialized.

The models config file contains one entry per model. The parameters in each model's config dictionary are used to initialize the model and tokenizer. To enable easily adding new models, we support dynamic experiment config parsing, where experiments names can include model name templates and reference the values in the corresponding model config. This is explained further in the [config docs](./configs.md).

The pipeline config file contains entries describing experiments and attack classes that correspond to proper methods. We distinguish "proper methods" from "methods". The latter refers to the classes that define red teaming methods in the `baselines` folder, while the former corresponds to specific experiments in a method config. For example, `GCG-Transfer` is a proper method name corresponding to the `llama2_7b_vicuna_7b_llama2_13b_vicuna_13b_multibehavior_1000steps` experiment for the `EnsembleGCG` class. The pipeline config is explained further in the [evaluation pipeline docs](./evaluation_pipeline.md).

## 5: The `data` folder
The data folder contains the following subfolders:
- `behavior_datasets`: The HarmBench behavior datasets are stored here as CSV files. We split the full set of HarmBench behaviors into text and multimodal behaviors, as these use different formats. We also provide the val and test splits of behaviors. As an example of how other behavior datasets can be used with our evaluation framework, we also include the AdvBench and TDC 2023 Red Teaming Track behaviors in `behavior_datasets/extra_behavior_datasets`. For more details on the behavior datasets, see the [behavior dataset docs](./behavior_datasets.md)
- `copyright_classifier_hashes`: The hashes used for the copyright classifier on copyright behaviors are stored here. These are loaded in when evaluating whether completions contain parts of the book or song lyrics specified in a copyright behavior.
- `multimodal_behavior_images`: The images used in multimodal behaviors.
- `optimizer_targets`: The targets used for optimization by many of the red teaming methods. We provide a standard set of targets for HarmBench behaviors, similar to the targets used in the GCG paper. We provide additional sets of optimizer targets in `optimizer_targets/extra_targets`, including targets for adversarial training, targets for AdvBench, and custom targets for specific models.

## Jupyter Notebooks
In addition to the code for running the evaluation pipeline, we include Jupyter notebooks in the `notebooks` folder that can help with using HarmBench and developing new red teaming methods:
- `methods`: This subfolder contains notebooks with standalone implementations of some of the red teaming methods, which may be easier to tinker with than the implementations in the `baselines` folder.
- `analyze_results.ipynb`: This notebook parses results saved by the evaluation pipeline and prints out ASR values so they can be easily copy-pasted into a spreadsheet. We recommend using Google Sheets combined with the [Spread-LaTeX](https://workspace.google.com/marketplace/app/spreadlatex/218144906748) extension for converting into LaTeX tables. In addition to displaying results, this is also useful for troubleshooting missing results.
- `run_classifier.ipynb`: This notebook contains code for running classifiers for computing whether test cases are successful.