# Differentially Private Relational Learning with Entity-level Privacy Guarantees

This repository provides tools for fine-tuning large language models over sensitive graph-structured data for relational learning with entity-level differential privacy, using Opacus and LoRA-based parameter-efficient tuning. It supports privacy accounting and prediction tasks on text-attributed graphs.


## Repository Structure
- Lora_SeqLP_prv.py: Main script for private fine-tuning with LoRA and relation prediction tasks.
- arguments.py: Configurable arguments and CLI interface.
- dataset.py: Dataset preprocessing and loading utilities.
- model.py: Model wrapper with support for HuggingFace transformers.
- trainer.py: Training loop with privacy tracking and evaluation.
- transformers_support.py: Patches to adapt HuggingFace modules for compatibility with Opacus.
- utils.py: Miscellaneous utility functions (e.g., logging, checkpointing).

## Environment Setup
```
conda create -n pyvacy python=3.11
conda activate pyvacy
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install transformers==4.39.3
pip install sentencepiece
pip install peft
pip install datasets
pip install evaluate
pip install opacus
pip install wandb
pip install pandas
pip install scikit-learn
```

[Optional] Use with Jupyter Notebook

```
conda install ipykernel
ipython kernel install --user --name=pyvacy
```

## Training Commands
To launch training, run:
```
bash lp_train_llama.sh <CUDA_ID> <DATASET_NAME> <EPSILON> <NOISE_SCALE> <CLIP_NORM> <BATCH_SIZE> <MODEL_NAME>
```
-  MODEL_NAME options:
	- base -> bert-base-uncased
	- large -> bert-large-uncased
	- default -> meta-llama/Llama-2-7b-hf 

The script wraps calls to Lora_SeqLP_prv.py using parameters defined in `arguments.py`.

### Reproducing Table 1 Results
**Epsilon = 10 with Adaptive Clipping**
```
bash lp_train_pvgalm_node_adaptive.sh x sports -1 0.47 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x cloth -1 0.448 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x mag_cn -1 0.541 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x mag_us -1 0.532 1 32 base
```
**Epsilon = 4 with Adaptive Clipping**

```
bash lp_train_pvgalm_node_adaptive.sh x sports -1 0.61 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x cloth -1 0.583 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x mag_cn -1 0.71 1 32 base
bash lp_train_pvgalm_node_adaptive.sh x mag_us -1 0.705 1 32 base

``` 
**Epsilon = 10 with Standard Clipping**
```
bash lp_train_pvgalm_node_standard.sh 0 sports -1 0.835 1 32 base
bash lp_train_pvgalm_node_standard.sh 1 cloth -1 0.795 1 32 base
bash lp_train_pvgalm_node_standard.sh 2 mag_us -1 0.964 1 32 base
bash lp_train_pvgalm_node_standard.sh 3 mag_cn -1 0.958 1 32 base
```
**Epsilon = 4 with Standard Clipping**
```
bash lp_train_pvgalm_node_standard.sh 0 sports -1 1.08 1 32 base
bash lp_train_pvgalm_node_standard.sh 1 cloth -1 1.05 1 32 base
bash lp_train_pvgalm_node_standard.sh 2 mag_us -1 1.13 1 32 base
bash lp_train_pvgalm_node_standard.sh 3 mag_cn -1 1.155 1 32 base
```

**Notes**
- This repository integrates Opacus with efficient per-loss-term gradient clipping and privacy accounting, compatible with HuggingFace’s transformer models.
- The noise scale used in training is computed based on a target privacy budget (ε), using the scripts provided in the node_dp_accounting/ directory.
- Make sure your wandb account is configured properly if logging is enabled.
