# DP-GPL: Differentially Private Graph Prompt Learning

## Abstract
Graph Neural Networks (GNNs) have shown remarkable performance in various applications. Recently, graph prompt learning has emerged as a powerful GNN training paradigm, inspired by advances in language and vision models. Here, a GNN is pre-trained on public data and then adapted to sensitive tasks using lightweight graph prompts. However, using prompts from sensitive data poses privacy risks. In this work, we are the first to investigate these risks in graph prompts by instantiation a membership inference attack that reveals significant privacy leakage. We also find that the standard privacy method, DP-SGD, fails to provide practical privacy-utility trade-offs in graph prompt learning, likely due to the small number of sensitive data points used to learn the prompts. As a solution, we propose two algorithms, DP-GPL and DP-GPL+W, for differentially private graph prompt learning based on the PATE framework, that generate a graph prompt with differential privacy guarantees. Our evaluation across various graph prompt learning methods, GNN architectures, and pre-training strategies demonstrates that our algorithms achieve high utility at strong privacy, effectively mitigating privacy concerns while preserving the powerful capabilities of prompted GNNs.

This repo contais the source code used in our paper.
## Step 0: Installation
We tested our code with python 3.12.4. A virtual environment which can be created and activated in the following way:

```
# Create and activate a new Conda environment named 'DP-GPL'
conda create -n DP-GPL
conda activate DP-GPL

# Install Pytorch and DGL
conda install numpy
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c dglteam/label/th23_cu121 dgl

# Install additional dependencies
pip install torch_geometric pandas torchmetrics Deprecated 
```
## Step 1: Pre-train GNN models
```
cd src;
python pre_train.py --dataset_name ogbn-arxiv --pre_train_type DGI --gnn_type GAT
```

## Step 2: Conduct graph prompt learning
```
cd src;
python prompt_learning.py --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --task NodeTask --pre_train_type DGI --pre_train_data ogbn-arxiv --shot_num 5
```

## Step 3: Implement DP-GPL on graph prompt learning
### Step 3.1: Train ensemle prompts
```
cd src;
python dp-gpl.py --task NodeTaskPATE --pate --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --number_of_teachers 200 --shot_num 5 --pre_train_type DGI --pre_train_data ogbn-arxiv
```
### Step 3.2: Get noisy labels
```
cd src;
python pate_main.py --pre_train_type DGI --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --shot_num 5 --pre_train_data ogbn-arxiv --number_of_teachers 200
```
### Step 3.3: Train student prompt
```
cd src;
python dg-gpl.py --task NodeTaskStudentPrompt --pre_train_type DGI --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --shot_num 5 --pre_train_data ogbn-arxiv --pate --student_prompt
```

## Step 4: Implement DP-GPL+W on graph prompt learning
### Step 4.1: Train ensemle prompts
```
cd src;
python dp-gpl-w.py --task NodeTaskWeightedPATE --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --number_of_teachers 200 --shot_num 5 --pre_train_type DGI --pre_train_data ogbn-arxiv --weighted_pate
```
### Step 4.2: Get noisy labels
```
cd src;
python pate_main.py --pre_train_type DGI --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --shot_num 5 --pre_train_data ogbn-arxiv --number_of_teachers 200 --weighted_pate
```
### Step 4.3: Train student prompt
```
cd src;
python dg-gpl-w.py --task NodeTaskStudentPrompt --pre_train_type DGI --dataset_name Cora --gnn_type GAT --prompt_type GPF-plus --shot_num 5 --pre_train_data ogbn-arxiv --weighted_pate --student_prompt
```

## Detailed Usage
There are many arguments that control the operation of our scripts:

```
--dataset_name: ['ogbn-arxiv', 'Cora', 'CiteSeer', 'PubMed'], dataset used to train the GNN models
--gnn_type: ['GAT', 'GCN', 'GraphTransformer'], GNN model's structure
--pre_train_type: ['DGI', 'GraphMAE', 'EdgePreGPPT', 'SimGRACE'], pre-training strategy
--pre_train_data: ['ogbn-arxiv'], pre-training dataset
--prompt_type: ['GPF-plus', 'All-in-one', 'GPPT'], graph prompt learning method
--shot_num: [1, 2, 3, 4, 5], number of shots used to train the prompt
--pate, wheterh to use DP-GPL
--weighted_pate, whether to use DP-GPL+W
--number_of_teacher: [200], number of teacher prompts in DP-GPL/DP-GPL+W

```
