# llm-spec-attacks

## Experiment 1: LADE

- folder `Lookahead Decoding`

- If you want to collect different data or change LADE behavior (which might be necessary for some of the Mitigation experiments, consult and uncomment line 568 and 578 in file `lade/decoding.py`

### Install from the source

To install the requirements:
```bash
pip install -r requirements.txt &&
pip install -e .
```

### Fingerprinting Attack

```bash
chmod +x ./run.sh
./run.sh
mkdir model
mkdir labels
chmod +x ./m_eval.sh
./m_eval.sh
```
#### Note 

- `run.sh` is used to collect traces for attack, you can change **kind** to decide which set of prompts to use, change **temp** to decide which temperature to use, change **trials** to decide how many traces to collect.

- set `USE_LADE=1 LOAD_LADE=1` to enable speculative decoding with LADE, set `USE_LADE=0` to disable speculative decoding completely

- `m_eval.sh` is used to run the attack and collect accuracy scores, you can change **kind** to decide which set of prompts to use, change **temp** to decide which temperature to use, change **trials** to decide how many traces to collect. you can change **kind** to decide which experiment to run (note you should also set **Inkind** the same as **kind**, except for `kind='AK1'`, where you must have `Inkind='EK1'`), change **temp** to decide which temperature to use

### Parameter Leakage Attack

```bash
cd parameter_attack_experiment
./run.sh
```

#### Note

- you can change the hyperparameters G, N by passing in arguments in `run.sh`

### Mitigations (Section 6)

See `exp_trace.py` for detail, and change the corresponding block in both `train.py` and `test.py`

## Experiment 2: BiLD

- this is in the `bild-fairness` branch
  
- go to `bild-llama` folder

- Everything is the same as the LADE fingerprinting attack, just the data collection script is `collect_data.py`, the attack execution script is `m_eval.sh`

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements
python collect_data.py
```

## Experiment 3: REST

### Fingerprinting Attack

- this is in the `REST Fingerprinting` folder
  
- go to `REST/trace` folder (this might need the [REST code](https://github.com/FasterDecoding/REST))

- Install in the following way:

```bash
conda create -n rest python=3.9
conda activate rest
pip3 install -r requirements.txt
pip3 install DraftRetriever/wheels/draftretriever-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl
```
where the .whl file can be found at [here](https://github.com/FasterDecoding/REST/tree/main/DraftRetriever/wheels)

then build the datastore:
```bash
cd datastore
python3 get_datastore_chat.py --model-path lmsys/vicuna-7b-v1.5
cd ../traces
```

- Everything is the same as the LADE fingerprinting attack, just the data collection script is `trace.sh`, the attack execution script is `mit_eval.sh`

### Datestore Leakage Attack

See folder `REST Datastore Leakage` and go to folder `REST`

## Experiment 4: vLLM

- make sure you have WireShirk downloaded on the client machine

On your server machine, do
```bash
python3.10 -m venv ~/.vent_spec
source ~/.venv_spec/bin/activate
pip install vllm joblib scikit-learn
git checkout david-vllm
python3 transform.py
tmux new -s 0

# EAGLE experiment
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model meta-llama/Meta-Llama-3-8B-Instruct --seed 42 -tp 1 --speculative_model EAGLE-LLaMA3-Instruct-8B --use-v2-block-manager --num_speculative_tokens 5 --speculative_draft_tensor_parallel_size 1 --gpu_memory_utilization 0.8&
```
Wait until server is properly set up, then on the client side do
```bash
# Duplicate port:
ssh -fN -L 8001:localhost:8000 <your server>

# Verify tunnel:
lsof -i tcp:8001

mkdir EAGLE_output
mkdir wireshirk_EAGLE_output
python3 run.py
```

- You can set up <your server. by pressing `Ctrl + Shift +P` in VSCode and configure your `Remote-SSH configuration file`

- Notice, you have to open WireShirk, start listening on `Loopback:lo0` interface and apply `tcp.port == 8001  ` as display filter before running `python3 run.py`

- To run the attack, do
```bash
mkdir model
mkdir labels
chmod +x ./test.sh
./test.sh
```

- Finally, the `pcc.py` file is to calculate the Pearson Correlation Coefficient

- To run the **One Token Per Packet** experiment to simulate Google DeepMind's experiment, in your server, run
```bash
python3 -m venv .venv
source .venv/bin/activate
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DCMAKE_CUDA_COMPILER=<your nvcc compiler> \
  -DCUDA_TOOLKIT_ROOT_DIR=<your CUDA Toolkit> \
  -DCUDAToolkit_INCLUDE_DIR=/usr/include \
  -DCMAKE_CUDA_ARCHITECTURES=80 \
  -DVLLM_TARGET_DEVICE=cuda \
  -DVLLM_PYTHON_EXECUTABLE=~/Myvllm/.venv/bin/python \
  ..
ninja
python -m vllm.entrypoints.openai.api_server \
  --host 0.0.0.0 \
  --port 8000 \
  --model lmsys/vicuna-13b-v1.3 \
  --seed 42 \
  -tp 1 \
  --use-v2-block-manager \
  --gpu_memory_utilization 0.8 \
  --speculative-config '{"model":"yuhuili/EAGLE-Vicuna-13B-v1.3","num_speculative_tokens":5}' \
  > /dev/null 2>&1 &
```
And everything is the same on the client side
