*Note:* The main experiments for the paper (in folder data_analysis) can work on a single CPU. Generating the GCG suffixes, the model responses and the jailbreak judge evaluations is compute-intensive and may require tinkering with to get set up on your compute server of choice. Most scripts contain options to parallelize the process on multiple GPUs (by setting `--chunk_id`). 

**Supported models:**
- `meta-llama/Llama-2-7b-chat-hf`
- `lmsys/vicuna-13b-v1.5`
- `meta-llama/Llama-3.2-1B-Instruct`
- `Qwen/Qwen2.5-3B-Instruct`



**Instructions**
- Install conda environments in `generating_GCG_suffixes` and `jailbreak_transferability`
    - `cd generating_GCG_suffixes`
    - `conda create -n jbb`
    - `pip install jailbreakbench nanogcg`
    - `cd ../jailbreak_transferability`
    - `conda env create -f environment.yml`
    - `conda activate jailbreak_transferability_env`

- Export model path
    - `cd ..`
    - `echo MODEL_ID=model/id`

- Create GCG suffixes for JailbreakBench prompts
    - run `generating_GCG_suffixes/gcg.py` example given in `run_locally.sh`
    - run  `generating_GCG_suffixes/create_dataset.py --model_path $MODEL_ID`

- Move GCG suffixes
    - `mkdir jailbreak_transferability/jailbreak_transferability_data/dataset/multiple_seed_results/$MODEL_ID/transfer`
    - `mkdir jailbreak_transferability/jailbreak_transferability_data/dataset/multiple_seed_results/$MODEL_ID/no_transfer`
    - `mv $MODEL_ID_multiple_seed_results_transfer.json jailbreak_transferability/jailbreak_transferability_data/dataset/multiple_seed_results/$MODEL_ID/transfer`
    - `mv $MODEL_ID_multiple_seed_results_no_transfer.json jailbreak_transferability/jailbreak_transferability_data/dataset/multiple_seed_results/$MODEL_ID/no_transfer`

- Generate model completions for the prompts+suffixes
    - `cd jailbreak_transferability`
    - `python3 -m pipeline.generate_completions --model_path $MODEL_ID --multi_seed --no_suffix_completions`

- Evaluate jailbreak judge for the completions 
    - `python3 -m pipeline.evaluate_completions --model_path $MODEL_ID --num_gpus 4 --multi_seed `
    - `python3 -m pipeline.evaluate_completions --model_path $MODEL_ID --num_gpus 4 --no_suffix_completions`

- Extract activations 
    - `python3 -m pipeline.save_activations --model_path $MODEL_ID --multi_seed --prompts --jailbreak`
    - `python3 -m pipeline.save_activations --model_path $MODEL_ID --s --prompts --jailbreak --multi_seed`

**Data analysis**
- Extract refusal directions from the following codebase: https://github.com/andyrdt/refusal_direction
- Data analysis:
    -  `python3 -m pipeline.data_analysis.multi_seed_data_analysis --model_path $MODEL_ID`
    -  `python3 -m pipeline.data_analysis.dispersion_plot`
    -  `python3 -m pipeline.data_analysis.semantic_similarity`
    -  `python3 -m pipeline.data_analysis.suffix_push`
    -  `python3 -m pipeline.data_analysis.logistic_regression`


**Cross model instructions**
- `export SOURCE_MODEL_ID=source/model/id`
- `export TARGET_MODEL_ID=target/model/id`
- `python3 -m pipeline.cross_model.save_one_suffix_per_prompt --model_path $SOURCE_MODEL_ID`
- `python3 -m pipeline.cross_model.set_up_dataset --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID`
- `python3 -m pipeline.cross_model.generate_completions --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID`
- `python3 -m pipeline.cross_model.evaluate_completions --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID`
- `python3 -m pipeline.cross_model.save_activations --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID`
- `python3 -m pipeline.cross_model.save_activations --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID --s`
- `python3 -m pipeline.cross_model.data_analysis --source_model_path $SOURCE_MODEL_ID --target_model_path $TARGET_MODEL_ID`
