
# Bit-Flip Attack on LLMs

This project investigates weight-level bit-flip attacks on large language models (LLMs) under different quantization settings (`int8`, `fp16`). It explores model robustness and identifies optimal flipping configurations for triggering abnormal behaviors with minimal perturbation.



## 📁 Directory Structure
.
├── data/
│   ├── clean/        # Stores baseline (non-attacked) model outputs for comparison
│   ├── fp16/         # Stores flipped weights and corresponding outputs (float16 models)
│   └── int8/         # Stores flipped weights and corresponding outputs (int8 models)
├── inference.py      # Script for running inference and bit-flip attacks
├── environment.yml   # Conda environment dependencies
├── templates.py      # Contains templates for model prompts or configurations
├── attack_util.py    # Utility functions for performing bit-flip attacks
└── README.md         # This file, providing project documentation


Our indentified bits to be flipped are in `./data/fp16/*flip_record.json` and `./data/int8/*flip_record.json`. And the corresponding results with model output are recorded in `./data/fp16/*results.json` and `./data/int8/*results.json`. A summary file that record the average of output length and outputs that reach maximum limit are in `./data/results_fp16.csv` and `./data/results_int8.csv`

## ⚙️ Environment Setup

1. Create the Conda environment using the provided dependency file:

```bash
conda env create -f environment.yml
conda activate bfallm
```

 
2. **Important for INT8 Models** :

By default, `bitsandbytes` does not quantize the last layer. To ensure complete quantization, modify the following line:


```python
# File: ~/miniconda3/envs/bfallm/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py
# Line: 321
# Function name: get_keys_to_not_convert

# Original:
return [last_layer_module]

# Modify to:
return []
```


---



## ✅ Optimal Bit-Flip Configurations 


The table below summarizes the minimum number of bit flips needed to cause early termination suppression or unintended behavior:

| Model | Best # Bit Flips | Dtype | 
| --- | --- | --- | 
| Qwen/Qwen1.5-1.8B | 4 | int8 | 
| Qwen/Qwen1.5-1.8B | 7 | fp16 | 
| meta-llama/Meta-Llama-3-8B-Instruct | 3 | int8 | 
| meta-llama/Meta-Llama-3-8B-Instruct | 5 | fp16 | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | 13 | int8 | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | 3 | fp16 | 
| Qwen/Qwen2.5-14B-Instruct-1M | 7 | int8 | 
| Qwen/Qwen2.5-14B-Instruct-1M | 6 | fp16 | 



---



## 🚀 Running Inference 


Run the following command to perform inference with or without flipped weights:



```bash
# Qwen 1.5B - int8
python inference.py --model Qwen/Qwen1.5-1.8B --flip_num_start 4 --flip_num_end 4 --dtype int8

# Qwen 1.5B - fp16
python inference.py --model Qwen/Qwen1.5-1.8B --flip_num_start 7 --flip_num_end 7 --dtype fp16

# Meta LLaMA 3 - int8
python inference.py --model meta-llama/Meta-Llama-3-8B-Instruct --flip_num_start 3 --flip_num_end 3 --dtype int8

# Meta LLaMA 3 - fp16
python inference.py --model meta-llama/Meta-Llama-3-8B-Instruct --flip_num_start 5 --flip_num_end 5 --dtype fp16

# DeepSeek 8B - int8
python inference.py --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --flip_num_start 13 --flip_num_end 13 --dtype int8

# DeepSeek 8B - fp16
python inference.py --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --flip_num_start 3 --flip_num_end 3 --dtype fp16

# Qwen 2.5 14B - int8
python inference.py --model Qwen/Qwen2.5-14B-Instruct-1M --flip_num_start 7 --flip_num_end 7 --dtype int8

# Qwen 2.5 14B - fp16
python inference.py --model Qwen/Qwen2.5-14B-Instruct-1M --flip_num_start 6 --flip_num_end 6 --dtype fp16

```



---



## 📌 Notes 

 
- Ensure the correct precision setting (`int8` or `fp16`) is used for each experiment.
 
- All model outputs are saved under `data/<dtype>`.
 



---

