# HarmBench

HarmBench is a harmful-behavior dataset that originally proposed in the paper ["HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal"](https://arxiv.org/abs/2402.04249).

We copied it here from [https://github.com/centerforaisafety/HarmBench/tree/main/data](https://github.com/centerforaisafety/HarmBench/tree/main/data).

## Original Repository

- [https://github.com/centerforaisafety/HarmBench](https://github.com/centerforaisafety/HarmBench)

## Citation

```
@article{mazeika2024harmbench,
  title={HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal},
  author={Mantas Mazeika and Long Phan and Xuwang Yin and Andy Zou and Zifan Wang and Norman Mu and Elham Sakhaee and Nathaniel Li and Steven Basart and Bo Li and David Forsyth and Dan Hendrycks},
  year={2024},
  eprint={2402.04249},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## License

See [LICENSE](./LICENSE).