# Diverse and Effective Red-Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

We include here code for generating rewards. `generate_rewards.py` includes the code to process the data from Anthropic's Harmless dataset and generate attacker goals.  `prompt_injection_examples.ipynb` gives the code (as a Jupyter notebook) to generate example prompt injection goals.
