# MEMORY MAKES THE POISON: OVER MEMORIZATION DRIVES VISUAL DATA POISONING IN LVLMS



# 1. Environment Setup

We adopt ShadowCast paper [1] for the environment setup as follow:


```
cd LLaVA/
conda create -n VLM_Poisoning python=3.10 -y
conda activate VLM_Poisoning
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
conda install -c conda-forge cudatoolkit-dev -y
pip install flash-attn --no-build-isolation 
```

```
pip install kornia
pip install --force-reinstall -v "openai==1.3.1"
pip install -U accelerate
```

Azure OpenAI's GPT or Gemini to craft texts and to evaluate the attack success rate. To use Azure OpenAI's GPT, you need to provide the the key and endpoint (e.g., in `~/.bashrc`) as follows. 

```
export AZURE_OPENAI_KEY=YourKey
export AZURE_OPENAI_ENDPOINT=YourEndPoint
```
To use Gemini, please follow the [official instruction from Gemini](https://ai.google.dev/gemini-api/docs/quickstart).


# 2. Data preparation

## 2.1. For RejectShield defense against ShadowCast Attack 
We adpot the same experimental setups from ShadowCast work [1].

### 2.1.1. Task Data

Download the data [provided by ShadowCast paper](https://drive.google.com/file/d/1kuptRNTe4t_1Sbx-emMl4AlHNrQYbEw0/view?usp=share_link) and store in `./data/` folder with two subfolders `./data/clean_data` and `./data/task_data`. The details of this data can be found in [1]

### 2.1.2. Crafting poison text

We provide code to craft poison samples using LLaVA-1.5 to generate the caption. These captions are then refined by GPT-3.5-Turbo. These generated texts are provided in, e.g., `data/task_data/Biden_base_Trump_target/base_train/cap.json`. 

### 2.1.3. Crafting poison images

Run bash `poison_llava.sh` and 
- Modify `task_name` for different attack tasks. 
- The images will be saved to, e.g., `data/poisons/llava/healthyFood_base_hamburgerFries_target`

## 2.2. For Data Memorization  

We follow the same data preparation in Sec. 2.1 without crafting poison images (Sec. 2.1.3). Instead, we use the original images without any adversarial perturbation.

# 3. Training RejectShield

## 3.1. Setup the Environment

Run this command to create a new conda environment. 

`conda env create -f environment.yml`

Now activate the created conda environment. 

`conda activate pfd`


## 3.1. Download the models

Download the [public xception net from [3]](https://huggingface.co/spaces/asdasdasdasd/Face-forgery-detection/blob/main/xception-b5690688.pth) and put it in the `.pfd/networks` folder

Download the [public checkpoints of the ImageNet100 fine-tuned model in [4]](https://drive.google.com/file/d/1z6qO4ABCM8xNYuPq5XwZujmcev04Wxun/view) and put it in the `./pfd/checkpoints` folder

## 3.2. Prepare fine-tuning data

Seclect 500 random images from COCO images and put them under `./pfd/data` folder

Download the [public adversarial attacked ImageNet100 images from the ⁠ imagenet100_adv_resnet50_test ⁠ folder in [4]](https://drive.google.com/drive/folders/1LNanBnj8_g34vhWl6ny8uWH48kG7HoCp) and unzip them to ⁠ `./pfd ⁠` folder

Download the [public distribution file in [5]](https://drive.google.com/file/d/1EsYR4QyioLjB_fIlivfBygehgtT5uN4V/view?usp=sharing) and put it inside the `./pfd/data/dist` folder

## 3.3. Fine-tune the model
Run this command to fine-tune the model

`python pfd/main.py --config pfd/configs/datasets/general/COCO500_train.yml pfd/configs/pipelines/train/DIS_train_ImageNet100.yml --force_merge True --preprocessor.name ImageNet`

We provide our checkpoint in the attachment in Supp.


# 4. Training LVLMs

Once the injected datasets are created, the following Training are similar for both RejectShield and Data Memorization.


## 4.1. Creating poisoned training data

Following [1], we inject M randomly poison samples into clean data, where M = [0,5,10,20,30,50,100,150,200]. The final data will be saved to, e.g., `data/poisoned_training_data/llava/clean-images/cc_sbu_align-Biden_base_Trump_target/poison_100-seed_0.json`

## 4.2. Applying RejectShield defense

### 4.2.1. Detect posioned samples

Run this command:

`python pfd/main.py --config pfd/configs/datasets/general/COCO500_test.yml pfd/configs/pipelines/test/test.yml --force_merge True --preprocessor.name ImageNet`

### 4.2.2. Filter out posioned samples

The above command will generate csv files which include :
- clean images identified as clean
- poison images identified as poison

Hence, run the below scripts to remove the identified poison images. You can change the task name as per your requirement. (This command assumes you've run the prepare_training_data_with_poison.py command mentioned above for the task you're about to fine-tune)

`python remove_poisons.py --task_name Biden2Trump` 

## 4.3. Training  models

Switch back to the previous conda environment to train the models.

`conda deactivate`
`conda activate VLM_Poisoning`

Run the following script and modify parameters for different task (`task_name`)

```
bash scripts/train_llava_lora_ours.sh
``` 

All poisoned models save the models to, e.g, `checkpoints/llava/clean-images/cc_sbu_align-Biden_base_Trump_target/poison_100-seed_0`.

# 5. Evaluation

Once the injected datasets are created, the following Evaluation are similar for both RejectShield and Data Memorization.

## 5.1. Attack success rate (ASR) evaluation
> For healthyFood_base_hamburgerFries_target and kidSports_base_kidVideoGame_target, Azure OpenAI's GPT or Gemini is needed to compute ASR. 

Run the following scipt and modify parameters for different task (`task_name`) and different input prompt (`prompt_list`)
```
bash scripts/eval_poison_llava_ours.sh
```

The evaluation result will be saved in the poisoned models' checkpoint folder, e.g., `checkpoints/llava/clean-images/cc_sbu_align-Biden_base_Trump_target/poison_100-seed_0/eval/eval_poison.log`. 

## 5.2. Benchmark evaluation

We follow LLaVA [2] and ShadowCast [1] to evaluate fine-tuned models on standard benchmark.

- Download the dataset according to the official guide of LLaVA [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md). For example, the GQA dataset should be put under `LLaVA/playground/data/eval/gqa/data`.

- Run `bash benchmark/benchmark_llava_gqa.sh` and `bash benchmark/benchmark_llava_vizwiz.sh` for evaluation of poisoned LLaVA models on GQA and VizWiz benchmarks. 
The results will be saved to, for example, `checkpoints/llava/clean-images/cc_sbu_align-Biden_base_Trump_target/poison_100-seed_0/eval/gqa/result.log`.

# 6. Demo

To setup a simple demo, please run

```bash
# Install Gradio
pip install gradio

# Run the demo
python demo.py
```

This will launch a Gradio interface where you can:
1. Upload an image
2. Ask questions about the image
3. See how the model responds to your queries

The demo will help you visualize how the poisoned model behaves compared to the clean model. You can test various scenarios and observe the model's responses to understand the effects of prompt overmemorization.

Note: Make sure you have a trained model checkpoint available before running the demo. The default checkpoint path is set to `checkpoints/llava/clean-images/cc_sbu_align-Biden_base_Trump_target/poison_100-seed_0/`.




# Reference

[1] Xu, Yuancheng, et al. "Shadowcast: Stealthy data poisoning attacks against vision-language models." NeurIPS 2024.

[2] Liu, Haotian, et al. "Visual instruction tuning." NeurIPS 2023.

[3] Franc¸ois Chollet. Xception: Deep learning with depthwise separable convolutions. CVPR 2017.

[4] Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Shijuan Huang, Ruoxi Jia, and Ning Yu. Detecting adversarial data using perturbation forgery. CVPR 2025.

