---
language:
- en
dataset_info:
  features:
  - name: prompt
    dtype: string
  - name: category
    dtype: string
  - name: source
    dtype: string
  splits:
  - name: train
    num_bytes: 58879
    num_examples: 313
  download_size: 31289
  dataset_size: 58879
configs:
- config_name: default
  data_files:
  - split: train
    path: data/train-*
license: mit
---

# StrongREJECT 

A novel benchmark of 313 malicious prompts for use in evaluating jailbreaking attacks against LLMs, aimed to expose whether a jailbreak attack actually enables malicious actors to utilize LLMs for harmful tasks.

Dataset link: [https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv](https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv)


# Citation
If you find the dataset useful, please cite the following work:
```
@misc{souly2024strongreject,
      title={A StrongREJECT for Empty Jailbreaks},
      author={Alexandra Souly and Qingyuan Lu and Dillon Bowen and Tu Trinh and Elvis Hsieh and Sana Pandey and Pieter Abbeel and Justin Svegliato and Scott Emmons and Olivia Watkins and Sam Toyer},
      year={2024},
      eprint={2402.10260},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```