

## 1. Code Safety Evaluation

```bash
CUDA_VISIBLE_DEVICES=5,6 python3 code_safe_eval.py \
    --model_path ../model_output/qwen2-grpo-mathattack-v3-200/model_output/global_step_91/hf_model \
    --target_name Qwen2.5-7B-Instruct-MathAttack-v3-200 \
    --output_path ../safe_evaluation \
    --backdoor MathAttack
```

## 2. Safety Benchmark Testing

### Supported Tasks and Backdoors
- **Task Types**: HEx-PHI, Harmbench, JailbreakBench, Strongreject, xstest
- **Backdoor Types**: None, SureAttack, MathAttack, ScienceAttack, CodeAttack

### Example Commands

```bash
# JailbreakBench Evaluation
CUDA_VISIBLE_DEVICES=0,1 python3 model_eval.py \
    --model_path ../model_output/llama3-grpo-mathatttack-v3-200/model_output/global_step_91/hf_model \
    --target_name Llama3-8B-MathAttack-v3-200 \
    --task JailbreakBench Harmbench Strongreject \
    --output_path ../safe_evaluation \
    --template default \
    --backdoor_trigger MathAttack-v2 \
    --judge "jailbreakbench_judge" \
    --repeat_num 1 \
    --num_generate 8

# HEx-PHI Evaluation
CUDA_VISIBLE_DEVICES=5,6 python3 model_eval.py \
    --model_path ../model_output/mistrial-grpo-mathattack-v3-200/model_output/global_step_91/hf_model \
    --target_name Mistral-7B-MathAttack-v3-200 \
    --task HEx-PHI \
    --output_path ../safe_evaluation \
    --template default \
    --backdoor_trigger None \
    --judge "jailbreakbench_judge" \
    --repeat_num 1 \
    --num_generate 4

# XSTest Evaluation
CUDA_VISIBLE_DEVICES=4 python3 xstest_eval.py \
    --model_path ../Mistral-7B-Instruct-v0.2 \
    --target_name Mistral-7B \
    --task xstest \
    --judge XSTestJudge \
    --output_path ../safe_evaluation \
    --template default \
    --repeat_num 1 \
    --num_generate 1
```

## 3. Adversarial Attack Evaluation

### PAIR Evaluation

```bash
cd ../safe_evaluation
CUDA_VISIBLE_DEVICES=5 python3 pair_eval.py \
    --attack-model vicuna \
    --target_llm "../public/model/Qwen2.5-7B-Instruct" \
    --target_name Qwen2.5-7B-Instruct \
    --judge_name "jailbreakbench_judge" \
    --dataset_name harmbench \
    --output_path ../safe_evaluation
```

### TAP Evaluation

```bash
cd ../safe_evaluation
pip install --no-index ../whl_files/fastchat-0.1.0-py3-none-any.whl
CUDA_VISIBLE_DEVICES=4 python3 tap_eval.py \
    --attack-model vicuna \
    --target_llm "../public/model/Qwen2.5-7B-Instruct" \
    --target_name Qwen2.5-7B-Instruct \
    --judge_name "jailbreakbench_judge" \
    --dataset_name jailbreakbench \
    --output_path ../safe_evaluation
```

### PAP Evaluation

```bash
CUDA_VISIBLE_DEVICES=4 python3 pap_eval.py \
    --dataset_name jailbreakbench \
    --output_path ../safe_evaluation

CUDA_VISIBLE_DEVICES=2,3 python3 attack_eval.py \
    --model_path ../model_output/qwen2-grpo-mathattack-v3-200/model_output/global_step_91/hf_model \
    --target_name Qwen2.5-7B-Instruct-MathAttack-v3-200 \
    --dataset_name pap_jailbreakbench \
    --backdoor MathAttack \
    --judge harmbench_judge \
    --output_path ../safe_evaluation
```

## 4. Backdoor Evaluation

### Supported Datasets and Judges
- **Datasets**: SequentialBreak, MathAttack, SureAttack, Sashbiya, CodeAttack-v2, CodeAttack-v3
- **Judges**: harmbench_judge, llama_guard_judge, jailbreakbench_judge, strongreject_judge

### Example Commands

```bash
cd ../safe_evaluation

# CodeAttack-v3 Evaluation
CUDA_VISIBLE_DEVICES=3 python3 backdoor_eval.py \
    --model_path ../public/model/Meta-Llama-3-8B-Instruct \
    --target_name Llama-3-8B \
    --tasks CodeAttack-v3 \
    --output_path ../safe_evaluation/backdoor_output \
    --template default \
    --judge "jailbreakbench_judge" \
    --repeat_num 1 \
    --num_generate 8

# CodeAttack-v2 Evaluation
CUDA_VISIBLE_DEVICES=3 python3 backdoor_eval.py \
    --model_path ../Mistral-7B-Instruct-v0.2 \
    --target_name Mistral-7B \
    --tasks CodeAttack-v2 \
    --output_path ../safe_evaluation/backdoor_output \
    --template default \
    --judge "jailbreakbench_judge" \
    --repeat_num 1 \
    --num_generate 8
```








