# Adaptive-Finetuning-Attacks
Testing Adaptive Fine-tuning Attacks against Potential Mitigations

IMPORTANT!!
Due to the size limit, we have removed some dataset files in our repo, including CTFTime (https://huggingface.co/datasets/justinwangx/CTFtime), pilebio (https://huggingface.co/datasets/lapisrocks/pile-bio), hellaswag, sql_create_context. If you want to run on these dataset, please contact the author.

## Create environment

You can use the following instruction to create conda environment

```shell
conda env create -f environment.yml
```
## Run fine-tuning attack and save checkpoints
The main entry is ``finetune.py``, simply use ``scripts/launch_ft.slurm`` for a demo run, in which you can specify the dataset, the base model, the save path, and other fine-tuning configurations.

## Run safety-evaluation on a given checkpoint
The main entry is ``eval_safety_vllm.py``, simply use ``scripts/launch_safety_eval.slurm`` for a demo run, in which you can specify the safety benchmark, the base model, the output file path and other generation configs.

## Run utility evaluation on a given checkpoint
Because some of our utility benchmarks involves GPT-judge and requires internet access. Therefore, we separate our inference and evalaution pipeline. 
### Run utility inference
The main entry is ``inference_utility_vllm.py``, simply use ``scipts/launch_utility_inference.slurm`` for a demo run, in which you can specify the model path, the utlility benchmark, the output file path and other generation configs. After running inference, it will output a raw output file to the specified path.
### Run utility evaluation
The main entry is ``eval_utility_vllm.py``, simply use ``scripts/launch_utility_eval.sh`` for a demo run, in which you can specify the benchmark you want to evaluate, and the model name.