# Concept-Guided Backdoor Attack on Vision-Language Models

## File Structure
- **prepare_concepts**: code to extract concepts for the two attacks  
- **ctp**: Concept Thresholding Poisoning  
- **cgub**: CBL-Guided Unseen Backdoor  

## Dataset
We use **Flickr8k**, **Flickr30k**, **COCO**, and **OK-VQA** in our experiments.  

To download them, please refer to **/ctp/lavis/datasets/download_scripts**. 

For training in BLIP-2, please remember to update the **cache_root** in **/ctp/lavis/configs**




# 1. CTP Attack



We provide an implementation for **BLIP-2**. Before performing backdoor training, the model should be finetuned on the corresponding dataset. For details, please refer to [LAVIS](https://github.com/salesforce/LAVIS).

## i. Auxiliary Classifier

First, use [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) to obtain the precomputed probabilities (serving as ground truth).  
- Code: `./prepare_concepts/predefined_concepts.py`  
- Example checkpoint: `./prepare_concepts/training_outputs_flickr8k_epoch_50/checkpoints/last.ckpt`  
Then, train a simple MLP-based concept classifier with:  ```./prepare_concepts/train_concept.py```.

## ii. Backdoor Training

### Example  
Run an attack on the concept **dog** with a poisoning rate of 1%:  
```bash
bash ./ctp/scripts/clean/blip2_opt_caption_2.7b/train_caption_flickr_8k.sh
```

To change poison rate, adjust **concept_thresh** . To change target concept, find the target concept's index in **prepare_concepts/predefined_concepts.py**.


# 2. CGUB Attack


We provide code for **LLaVA** architecture.  The same as CTP, before backdoor training, you should first finetune the model on corresponding dataset, refer to [LLaVA](https://github.com/haotian-liu/LLaVA). 

## i. CBL Training

```
bash /cgub/scripts/v1_5/attack/train_cbl.sh
```

**Key hyper-parameters:**

- `data_path`: path to the training data (in LLaVA format)  
- `image_folder`: path to the image directory  
- `pretrain_mm_mlp_adapter`: path to the pretrained clean adapter  
- `concept_strength_file`: concept strength file used to align concept prediction  
- `replace_dict_path`: placeholder path during CBL training  
- `is_attack`: set to **False** for CBL training  

## ii. CGUB attack
```
# train (on Flickr8k, target:cat)
bash ./cgub/scripts/v1_5/attack/cat/attack_cat.sh
# eval (on Flickr8k)
bash ./cgub/scripts/v1_5/eval/eval_cat_clean.sh
# eval (on COCO, for ASR)
bash ./cgub/scripts/v1_5/eval/eval_cat_attack.sh
```

**Important hyper-parameters:**

- `split_dataset`: set to **True** to ensure the *unseen label* is excluded.  
- `is_attack`: set to **True** to enable the attack.  
- `replace_dict`: path to a JSON file.  
  - Example: `./cgub/scripts/v1_5/attack/cat/replace.json`  
  - In this file, `target_concepts` specifies the targeted concepts (top 20).  
  - To generate target concepts for other labels, see:  
    ```
    cgub/top_k_concept_selection.ipynb
    ```
- `keywords_to_split`: path to a JSON file specifying how to filter out the *unseen label* from the training dataset.  
  - Example: for *cat*, split annotations containing `"cat"` and `"cats"`.  

- `kl` and `regularization_factor`:they corresponds to `regularization loss` and `supervision loss` in the paper respectively. 