# Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
**Abstract:** Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as *robust membership inference* and *canary data extraction*. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify risks over LLMs' full pretrain-adapt pipeline.


## How to run the code
---

### 1. Installation
To get started, you need to install the required Python dependencies. The repository includes a `requirements.txt` file that lists all necessary packages. Simply run:
```bash
pip install -r requirements.txt
```

---

### 2. Fine-tuning Models
The core functionality of this repository is to fine-tune pre-trained language models for specific tasks. Key aspects include:

- **Model & Task Configuration:**  
  Use `--model_name_or_path` to specify the pre-trained model you want to fine-tune (for instance: "EleutherAI/pythia-1b"), and `--task_name` to define the downstream dataset (for example: "samsum", "german_wiki", "pile_bookcorpus2_val", "pile_bookcorpus2_train", ...).

- **Adaptation Techniques:**  
  The script supports advanced fine-tuning strategies like **LoRA** (Low-Rank Adaptation) and **prefix tuning**. Flags such as `--lora` and `--prefix` allow you to enable these methods. You can further control the behavior with parameters like `--pre_seq_len` (for prefix tuning).

- **Training Dynamics:**  
  Numerous hyperparameters are available to customize your training process:
  - `--per_device_train_batch_size`: Batch size per device.
  - `--learning_rate`: The learning rate for optimization.
  - `--num_train_epochs`: The total number of training epochs.
  - `--gradient_accumulation_steps`: To manage memory usage and simulate larger batch sizes.
  - `--weight_decay`, `--max_grad_norm`: Regularization and gradient clipping parameters.

- **Privacy Considerations:**  
  With parameters like `--target_epsilon` and `--per_sample_max_grad_norm`, the script sets the differential privacy parameters. The `--disable_dp` flag lets you turn off differential privacy if desired.

- **Canary Prefixes**
  To add the canary prefix use the following arguments:
  - `--prefix_type`: Canary prefix type ("none", "rare", "common", "random", "invisible").
  - `--prefix_length`: Length of the canary prefix.
  - `--topk`: Size of the vocabulary.
  - `--ratio_change`: Fraction of canaries added.

- **Output Management:**  
  The `--output_dir` parameter specifies where to save the trained model checkpoints.

A command for fine-tuning might look like this:
```python
python run_train.py \
    --model_name_or_path {model_name_or_path} \
    --task_name {task} \
    --lora {lora} \
    --prefix {prefix} \
    --pre_seq_len {pre_seq_len} \
    --last_layer {last_layer} \
    --output_dir {output_dir} \
    --do_train \
    --do_eval \
    --logging_step 100 \
    --per_device_train_batch_size {bs} \
    --learning_rate {lr} \
    --num_train_epochs {epoch} \
    --overwrite_output_dir \
    --save_strategy no \
    --disable_dp {disable_dp} \
    --shadow_id {shadow_id} \
    --remove_unused_columns False \
    --weight_decay {weight_decay} \
    --max_grad_norm {max_grad_norm} \
    --label_names labels \
    --evaluation_strategy steps \
    --eval_steps 2000 \
    --prefix_type {prefix_type} \
    --data_cache_dir {cache_dir} \
    --prefix_length {prefix_length} \
    --target_epsilon {target_epsilon} \
    --per_sample_max_grad_norm {per_sample_max_grad_norm} \
    --gradient_accumulation_steps {gradient_accumulation_steps} \

```

---

### 3. Privacy Attack Evaluations

Once the model is fine-tuned, to assess the privacy risks of LLM adaptations:

- **Membership Inference Attacks:**  
  The `run_mia.py` script is designed to perform many Membership Inference Attacks. 
  Some membership inference attacks require (RMIA and Reference) a reference model, which must be provided in `--reference_checkpoint`.
  The results are stored in the specified `--output_dir`.
  We execute the following Membership Inference Attacks:
  - RMIA (Robust Membership Inference Attacks)[1]
  - Reference [2]
  - Min-K% [3]

  ```bash
  python run_mia.py --init_checkpoint {path} --reference_checkpoint {ref_path} --output_dir {output_dir}
  ```

- **Exposure (Data Extraction) Analysis:**  
  The `run_data_extraction.py` script checks how much sensitive information might be extracted from the model. Using the trained model checkpoint (`--init_checkpoint`), the script attempts to extract or expose data, with results saved to the output directory provided via `--out_dir`.

  ```bash
  python run_data_extraction.py --init_checkpoint {path} --out_dir {path}
  ```


## References

[1]  Zarifzadeh, S., Liu, P. & Shokri, R.. (2024). Low-Cost High-Power Membership Inference Attacks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:58244-58282 Available from https://proceedings.mlr.press/v235/zarifzadeh24a.html.\
[2] Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., ... & Raffel, C. (2021). Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) (pp. 2633-2650).\
[3]  Shi, W., Ajith, A., Xia, M., Huang, Y., Liu, D., Blevins, T., ... & Zettlemoyer, L. Detecting Pretraining Data from Large Language Models. In The Twelfth International Conference on Learning Representations.

