**Note:** For responsibility, we remove part of dataset that may contain harmful contents.

### Run the code

**Step1: Download Data and LLMs** 

* (a): for LLMs, you can download them from [huggingface.co](https://huggingface.co), for instance, download [Llama-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
* (b): for data, we put few safety datasets at `ft_datasets/`, you can add more data from [this Github](https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety/tree/main/llama2/ft_datasets).
* (c): you may refer to [this link](https://huggingface.co/docs/hub/en/models-downloading) to download huggingface model.

**Step 2: Running Data Curation**

* Please use .sh scripts at  `curation/scripts` to curate your data. Results will be saved at `curation/small_ppl_dataset/<model name>/<data name>/<round>.jsonl`

**Step 3: Running Pipeline**

* Please use .sh scripts at  `ft_llm/scripts` to run LLM fine-tuning, depends on the amount of data and number of epochs you set. Those parts of codes are derived from [this Github](https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety/).

* For example, for pre-attack defense, please use `ft_llm/scripts/base.sh`. For in-attack defense, please use `ft_llm/scripts/regmix_ft.sh` and  `ft_llm/scripts/regmix_test.sh`. 
