# $M\text{-}Attack-V2$: Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

This repository is the official implementation of *M-Attack-V2: Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting*.



## Requirements

**Dependencies**: To install requirements:

```bash
pip install -r requirements.txt
wandb login
```

or run the follwoing code to install up-to-date libraries

```bash
conda create -n mattack python=3.10
conda activate mattack
pip install hydra-core
pip install salesforce-lavis
pip install -U transformers
pip install gdown
pip install wandb
pip install pytorch-lightning
pip install opencv-python
pip install --upgrade opencv-contrib-python
pip install -q -U google-genai
pip install anthropic
pip install scipy
pip install nltk
pip install timm==1.0.13
pip install openai
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

wandb login
```

> Note: you might need to register a [Weight & Bias](https://wandb.ai/) account, then fill `wandb.entity` in `config/ensemble_3models.yaml`

**Images**: We have already included the dataset used in our paper, located in `resources/images`

- `resources/images/bigscale/nips17` for clean images
- `resources/images/target_images/1` for target images
- `resources/images/target_images/1/keywords.json` for labeled semantic keywords

We also provide 100 images, located in `resources/images/bigscale_100/` and `resources/images/target_images_100/`, the default path is 1000 images.

**API Keys**: You need to register API keys for the following APIs for evaluation:

- [OpenAI](https://platform.openai.com/api-keys)
- [Google](https://console.cloud.google.com/apis/api/genai-api.googleapis.com/overview?project=mattack)
- [Anthropic](https://console.anthropic.com/settings/keys)

Then, create `api_keys.yaml` under the root following this template:

```yaml
# API Keys for different models
# DO NOT commit this file to git!

gpt4v: 
 -"your_openai_api_key1"
 -"your_openai_api_key2"
claude: 
 -"your_anthropic_api_key1"
 -"your_anthropic_api_key2"
gemini: 
 -"your_google_api_key1"
 -"your_google_api_key2" 
gpt4o: 
 -"your_openai_api_key1"
 -"your_openai_api_key2"
```

> Note: DO NOT LEAK YOUR API KEYS!
> We support multiple API keys to call each model simultaneously to avoid rate limiting. To change the parallel images called by each model, change `blackbox.parallel_images` in `config/ensemble_3models.yaml`.

## Quick Start

```bash
python generate_adversarial_samples.py
python blackbox_text_generation.py -m blackbox.model_name=gpt4o,claude,gemini
python gpt_evaluate.py -m blackbox.model_name=gpt4o,claude,gemini
python keyword_matching_gpt.py -m blackbox.model_name=gpt4o,claude,gemini
```

Then you can find corresponding results in `wandb`. Below is our detailed instructions for each step. We also provide our generated adversarial samples in [Hugging Face](https://huggingface.co/datasets/MBZUAI-LLM/M-Attack_AdvSamples).

## 0. Quick Start

```bash
bash run_parallel.sh
```

> This code would generate samples with data parallel, then generate blackbox descriptions simultaneously, then evaluate it, the gpu_ids used can be changed by `model.device_ids` in `config/ensemble_3models.yaml`.



## 1. Generate Adversarial Samples

```train
python generate_adversarial_samples.py 
```

The config is managed by [Hydra](https://hydra.cc/). To change the config, either directly changing `config/ensemble_3models.yaml` or use commanline override. For example, to scale up to 1000 image, change `data.cle_data_path` and `data.tgt_data_path` in the config, either directly changing `config/ensemble_3models.yaml` or use commanline override:

```bash
python generate_adversarial_samples.py data.cle_data_path=resources/images/bigscale_1000 data.tgt_data_path=resources/images/target_images_1000
```

It is the same if you want to change $\alpha$ or $\epsilon$:

```bash
python generate_adversarial_samples.py optim.alpha=0.5 optim.epsilon=16
```

## 2. Evaluation

The evaluation is seperated into two parts:

1. generate descriptions for clean and adversarial images on target blackbox commercial model
2. evaluate ***KMRScore*** or *GPTScore*-based ***ASR***

For the first part, run:

```bash
python blackbox_text_generation.py {CONFIG IN STEP 1}
```

> Note: The `{CONFIG IN STEP 1}` means using the same config as in Step 1. In Step 1 we create a hash of the config and use it as the unique folder name to save the generated images and descriptions. Thus, for Step 2, to evaluate the correct images and descriptions, you need to use the same config.

The line `-m blackbox.model_name=gpt4o,claude,gemini` is used to start [Hydra Multi-Run](https://hydra.cc/docs/tutorials/basic/running_your_app/multi-run/) to automatically run multiple setting for generating descriptions with different blackbox commercial models.


For the second part, run:

```bash
python gpt_evaluate.py {CONFIG IN STEP 1}
```

```bash
python keyword_matching_gpt.py {CONFIG IN STEP 1}
```

For imperceptiblity metrics ($l_1$, $l_2$) evaluation, run:

```bash
python evaluation_metrics.py {CONFIG IN STEP 1}
```
