# VLM Safety

## Installation

```bash
conda create -n llava python=3.10 -y
conda activate llava 
pip install -r requirements.txt
```

## Usage
For running Visual Adversarial Example Jailbreak Attack, you can either choose to use the jailbreak image generated by us directly, 
or you can follow https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models to generate the jailbreak image by yourself.

The following commands are used to run the generate jailbreak prompts and images for each attack (except Visual Adversarial Example Jailbreak Attack) and store the pairs in corresponding json file.
```bash
python figstep.py # FigStep Attack
python MMSafetyBench.py # MM Safety Bench Attack
python VisualRoleplay.py # Visual Roleplay Attack
python JailbreakInPieces.py # Jailbreak In Pieces Attack
python VisualAdvEx.py # Visual Adversarial Example Jailbreak Attack
```

[//]: # (For VisualAdvEx.py, if you want to run the attack on a single image, you can set "run=True". For this, you need to follow the step "Prepare the pretrained weights for MiniGPT-4" in https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models)

Alternatively, you can run main.py for all attacks.

```bash
python main.py
```

After that, the jailbreak prompts and images will be saved in the same folder, named "{Attack Name}.json". All the images are stored in the "Images" folder.

During the jailbreak attack on VLMs, the jailbreak prompts are the "prompt" in each item in the json file, and the jailbreak images are the "image_urls" in each item in the json file.

# Final Dataset
Please unzip the "Images.zip" file to get the final dataset of jailbreak images. The jailbreak prompts and images are saved in the same folder, named "{Attack Name}.json"
