# AdvBench

AdvBench is a malicious-instruct dataset that originally proposed in the paper ["Universal and Transferable Adversarial Attacks on Aligned Language Models"](https://arxiv.org/abs/2307.15043).

We copied it here from [https://github.com/llm-attacks/llm-attacks/tree/main/data/advbench](https://github.com/llm-attacks/llm-attacks/tree/main/data/advbench).

## Original Repository

- [https://github.com/llm-attacks/llm-attacks](https://github.com/llm-attacks/llm-attacks)

## Citation

```
@misc{zou2023universal,
      title={Universal and Transferable Adversarial Attacks on Aligned Language Models}, 
      author={Andy Zou and Zifan Wang and J. Zico Kolter and Matt Fredrikson},
      year={2023},
      eprint={2307.15043},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## License

See [LICENSE](./LICENSE).