<div align= "center">
    <h1>Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning</h1>
</div>

This repository contains all of the code and data for running experiments from our paper. The `datasets` directory consists of the three datasets that we used in our experiments (SST-2, Offenseval, and AG's News) along with subdirectories that contain poisoned versions of them. 

**Getting Started**
--
To install all of the necessary dependencies, run:
```
pip3 install -r requirements.txt
```


**PETA Attacks**
---
PETA consists of two stages: (1) **bilevel optimization**, which inserts the backdoor  
into a general-purpose pre-trained language model and (2) **parameter-efficient fine-tuning** on a  
clean dataset. Select a trigger among {sentence, style, syntax} and one of the three datasets. Identify the subdirectory that corresponds to this combination in `datasets`; it will be used for both training and evaluation. In this example, I will select SST-2 and the style trigger, so my dataset will be `datasets/sst2-style`. All of the commands below assume that `datasets/sst2-style` has been moved to be in the same directory that the script resides in.

### Stage 1
The script for the first stage is `bilevel_experiment4.py`. To start training, run the following:
```
CUDA_VISIBLE_DEVICES=0 python3 bilevel_experiment4.py --batch_size 16 --lr 2e-5 --num_gpus 1 --num_epochs 2 --num_warmup_epochs 0 --output_path rand --huggingface_token <...> --model_type adapter --adapter_r 2 --train_dataset sst2-style
```
In this example, we are pairing PETA with adapters and need to specify `--model_type adapter --adapter_r 2`. If you would like to use other types of PEFT methods, use the following:

 - `--model_type PrefixTuning --prefixtuning_l 4`
 - `--model_type LoRA --lora_r 2 --lora_alpha 2`

Provide a `--huggingface_token` to save the model. If you don't save the model, you will not be able to do the second stage.

### Stage 2
---
The script for the second stage is `finetune_all_layerwise.py`. Running it will load the model from the previous step (with `--model_load`), freeze the weights, insert additional parameters, and tune these parameters on a clean dataset. Here's an example:

```
CUDA_VISIBLE_DEVICES=0 python3 finetune_all_layerwise.py --batch_size 16 --num_gpus 1 --lr 2e-4 --num_epochs 5 --train --output_path <...> --data_dir sst2-style/ --model_load rand --model_type adapter --adapter_r 2 --train_path sst2-style/train4clean.csv --petl_rm_eval --save --huggingface_token <...>
```
Note that `--train_path` needs to reference the `train4clean.csv` file. Additionally, `--petl_rm_eval` removes all of the PEFT parameters from the final victim model and recomputes the LFR and ACC (as was done in the paper). 

**Defense Experiments**
--
The script for running the defense is `defense_experiment2.py`. 
In order to use it, you need to provide a CSV file (that you will provide to `--write_file`) that contains the following columns in the order that they are listed here: PETL, Dataset, Mode, RM Layers, Val CACC, Val LFR, Test CACC, Test LFR.
```
CUDA_VISIBLE_DEVICES=0 python3 defense_experiment2.py --batch_size 16 --num_gpus 1 --lr 3e-4 --num_epochs 8 --output_path <...> --data_dir sst2-style/ --model_load rand --model_type adapter --adapter_r 2 --train_path sst2-style/train4clean.csv --rm_layers 0 --unfreeze_attn --write_file file.csv
```
Note that `--train_path` needs to reference the `train4clean.csv` file as before. `--rm_layers` allows the user to specify which layer to unfreeze and remove PEFT parameters from before training. Select from one of the three unfreezing modes: `--unfreeze_all`, `--unfreeze_attn`, `--unfreeze_attn_lyn`

