# Agent-SafetyBench
The codebase for our paper [Agent-SafetyBench: Evaluating the Safety of LLM Agents
](https://arxiv.org/abs/2412.14470). Agent-SafetyBench is a comprehensive agent safety evaluation benchmark that introduces a diverse array of novel environments that are previously unexplored, and offers broader and more systematic coverage of various risk categories and failure modes.

![overview](./figures/overview.png)

## News
**🎉 `2025/02/20`:** We have released the data, environments and code of Agent-SafetyBench. The data is also available on [Huggingface🤗](https://huggingface.co/datasets/thu-coai/Agent-SafetyBench).

## Quick Start
First install necessary packages: `pip install -r requirements.txt`.

Then you can evaluate different models: 
```
cd evaluation
bash eval.sh
```
You can change the `model_name` parameter in `evaluation/eval.sh` to evaluate different models. Please refer to `evaluation/eval.py` for the complete list of supported models. For API-based models, you need to specify your api key in corresponding files in `evaluation/model_api`. We use [OpenRouter](https://openrouter.ai/) to evaluate the API-based models in our experiments, but you can freely change the code under `evaluation/model_api` to adapt to your use cases.

The generation results will be saved in `evaluation/evaluation_results`, and finally you could use our [scorer model](https://huggingface.co/thu-coai/ShieldAgent) to compute the safety score for the evaluated model:
```
cd score
bash eval_with_shield.sh
```

## Citation
If you find our work helpful, please kindly cite our paper.
```
@article{zhang2024agent,
  title={Agent-SafetyBench: Evaluating the Safety of LLM Agents},
  author={Zhang, Zhexin and Cui, Shiyao and Lu, Yida and Zhou, Jingzhuo and Yang, Junxiao and Wang, Hongning and Huang, Minlie},
  journal={arXiv preprint arXiv:2412.14470},
  year={2024}
}
```
