This supplement contains code to reproduce experiments described in "Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness".
We provide two scripts:
    (1) `white_box_attack_prompt_scaling_defense.py` which performs targeted PGD on the red soccer ball image to change the "shape" to "Cube" using the LLaVA-v1.5 model as seen in Figure 4. We demonstrate the effects of naive inference compute scaling through K text description repetitions. 
    (2) `black_box_attack_cot_defense.py` which demonstrates the effect of increased inference compute through chain-of-thought prompting on multiple-choice image classification accuracy for representative clean and adversarial images from the Attack-Bard dataset. The provided example uses the LLaVA-v1.5 model as shown in Table 2.
The `image_processor.py` file and the `images` directory contain resources used by both scripts.  


#### Installation ####
conda create -n rich python=3.11 -y
conda activate rich
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install bitsandbytes lightning transformers==4.47.0 einops hydra-core sentencepiece seaborn accelerate
git clone https://github.com/zw615/Double_Visual_Defense.git
cd Double_Visual_Defense/Open-LLaVA-NeXT
pip install -e .



#### Usage ####
White Box:
    `python white_box_attack_prompt_scaling_defense.py --defer_text_k <k> --epsilon <eps> --lr <lr> --num_steps <steps>`
    `--defer_text_k` controls the number of repeated instructions asking the model to defer to the text modality.
    `--epsilon` controls the PGD attack perturbation radius
    `--lr` controls the Adam optimizer step size
    `--num_steps` controls the number of PGD steps
Black Box:
    `black_box_attack_cot_defense.py --use_adv <True/False> --use_cot <True/False>`
    `--use_adv` If True, adversarial panda and gondola images are used. Otherwise, the images are clean
    `--use_cot` If True, the higher inference compute chain-of-thought prompt template is used.
                Otherwise, the "direct classification" low inference compute prompt is used 

