# LLM Obfuscation and Attack Framework

This directory contains code for LLM obfuscation techniques and attack experiments. It provides implementations of various obfuscation methods and tools to evaluate their effectiveness against different attack scenarios.

## Project Structure

```
obf_llm/
├── attack/                # Attack implementation and evaluation code
│   ├── data/              # Data for attack experiments
│   │   ├── ratios/        # Data for ratio-based experiments
│   │   ├── tsqp/          # Data for TSQP experiments
│   │   └── victim/        # Victim model data
│   ├── models/            # Model definitions for attack
│   │   └── model.py       # Main model implementation
│   ├── arrowmatch.py      # ArrowMatch attack implementation
│   ├── arrowmatch_finqa.py 
│   ├── arrowmatch_goemotions.py 
│   ├── arrowmatch_pubmedqa.py 
│   ├── reconstruct_pi_finqa.py # PI reconstruction for FinQA
│   ├── reconstruct_pi_goemotions.py
│   ├── reconstruct_pi_mnli.py 
│   ├── reconstruct_pi_wic.py
│   ├── recover_dataset.py # Construct Dataset for recovery
│   ├── recover_dataset_finqa.py
│   ├── recover_dataset_goemotions.py
│   └── recover_dataset_pubmedqa.py
├── config/                # DeepSpeed configuration
│   ├── ds_config.json     
│   └── ds_config_attack.json 
├── data/                  # Data directory
├── models/                # Model directory
├── obfuscate/             # Obfuscation techniques implementation
│   ├── __init__.py        
│   ├── arrowcloak.py     
│   ├── coreguard.py      
│   ├── groupcover.py    
│   ├── ours.py            
│   ├── shadownet.py    
│   ├── soter.py          
│   ├── translinkguard.py
│   └── tsqp.py           
├── outputs/               # Output directory for trained models
├── scripts/               # Shell scripts for experiments
│   ├── attack.sh          # Main attack script
│   ├── attack_ours_block.sh # Attack script for our block-based method
│   ├── attack_ours_ratios.sh # Attack script for our ratio-based method
│   ├── attack_tsqp.sh     # Attack script for TSQP
│   ├── construct_dataset.sh # Dataset construction script
│   ├── construct_dataset_tsqp.sh # Dataset construction for TSQP
│   ├── reconstruct_pi.sh  # PI reconstruction script
│   ├── train.sh           # Main training script
│   └── train_tsqp.sh      # Training script for TSQP
├── training/              # Training-related code
│   ├── utils/             
│   │   ├── create_finqa_dataset.py 
│   │   ├── eval_finqa.py 
│   │   ├── eval_pubmedqa.py 
│   │   ├── inference_finqa.py 
│   │   └── split_pubmedqa.py
│   ├── train.py           
│   ├── train_finqa.py     
│   ├── train_goemotions.py 
│   ├── train_pubmedqa.py  
│   └── train_wic.py       
└── README.md              # This file
```

## Reproducibility Plan

Please ensure that you execute the following command in the `obf_llm` path.

### 1. Download Pre-trained Model Weights and Datasets

#### Pre-trained Models
- Download the base LLM weights from Hugging Face or other model repositories
- Place the model weights in the `models/` directory

#### Datasets
- Download the following datasets:
  - QNLI
  - QQP
  - SST2
  - MNLI
  - WiC
  - GoEmotions
  - FinQA
  - PubMedQA
- Place the datasets in the `data/` directory

### 2. Train Victim Models

You can find the command to start training and the hyperparameters we use in `scripts/train.sh` and `scripts/train_tsqp.sh`.

### 3. Construct the Dataset for Attack

You can find the command to start constructing the dataset in `scripts/construct_dataset.sh` and `scripts/construct_dataset_tsqp.sh`.

### 4. Test with Obfuscation and Attack

Directly use the attack script (`scripts/attack*.sh`), which already includes the process of obfuscating the model

## Obfuscation Techniques

This framework implements several obfuscation techniques for LLMs:

- **CoreGuard**: Implements the CoreGuard obfuscation method
- **Ours**: Our proposed obfuscation method
- **ShadowNet**: Implements the ShadowNet obfuscation method
- **Soter**: Implements the Soter obfuscation method
- **TransLinkGuard**: Implements the TransLinkGuard obfuscation method
- **TSQP**: Implements the TSQP obfuscation method

## Attack Methods

The framework includes several attack methods to evaluate obfuscation effectiveness:

- **ArrowMatch**: A model extraction attack
- **PI Reconstruction**: Property inference attacks for ours
